Data Engineer code challenge

In this project, I solved the code challenge for the job of Data Engineer at Indicium company. The complete description of the challenge can be found in this repository.

Necessary tools:

  • Python (tested in version 3.11)
  • pip (package manager)
  • Docker (docker-compose)

Usage

Follow this steps to properly run this script:

  • Clone the repository and navigate into it.
  • In repository folder, run pip install -r requirements.txt on terminal.
  • Run docker-compose up -d on terminal (wait until Postgres database populate tables. Run docker logs code-challenge_pg_db_1 and check if the last line is something like this:
    LOG: database system is ready to accept connections
  • After both containers are up, run python pipeline.py

Result

Here is a screenshot of the full pipeline being executed:



Project information

  • Category: Data Analysis
  • Project date: 13 July, 2022
  • Project files: On Github