ETL Automation


This project is about getting aggregate data from which we can identify the matrices like how many shoppers were in a day on a DMA or a region or in a DMA and a region, or national from raw and very granular data about car shoppers. So, The process which involved to get this result we use ETL, in general terms, we have data about car shoppers each record represents a shopper and for each shopper, we have lots of information like region and DMA, the region/DMA where the shopper was made.


  • Django
  • Python
  • Plotly-Dash
  • Flask
  • Panda
  • Numpy
  • Airflow Framework
  • Jenkins

Key Technical Challenges:

  • Explore Airflow framework and write new DAG files for ETL processes.
  • Creating complex GUI in Python with the help of Plotly Dash framework.
  • Writing optimized API for various ELT processes.

Business + Technical Points:

  • Creation of new DAG files with the help of the Airflow framework for various functions or operations.
  • Implement ETL processes in a large data warehouse.
  • Building web application with the Plotly Dash framework.
  • Building periodic jobs with Jenkins to handle all ETL functionalities.