ETL traditionally required managing complex systems in the cloud or large data centers. This talk covers how Python and DuckDB—both open-source and resource-efficient—can handle ETL processes on standard computers or notebooks. The presentation will overview typical ETL steps, introduce DuckDB, and demonstrate its practical application.
ETL stands for "extract, transform, load" and is a synonym for moving data around. This has traditionally often required managing complex systems in the cloud or large data centers.
ETL traditionally followed extract-transform-load order, but modern approaches based on data lakes often transform data within database systems for efficiency with large volumes. Rather than requiring complex workflow systems and distributed infrastructure, contemporary tools allow simpler implementations.
This talk covers how Python and DuckDB—both open-source and resource-efficient—can handle ETL processes on standard computers or notebooks. Python is highlighted as having extensive backend connectors, while DuckDB is presented as a capable embedded OLAP database with data lake support. The presentation will overview typical ETL steps, introduce DuckDB, and demonstrate its practical application, potentially including optimization strategies.