Data Engineer code challenge
In this project, I solved the code challenge for the job of Data Engineer at Indicium company. The complete description of the challenge can be found in this repository.
Necessary tools:
- Python (tested in version 3.11)
- pip (package manager)
- Docker (docker-compose)
Usage
Follow this steps to properly run this script:
- Clone the repository and navigate into it.
- In repository folder, run
pip install -r requirements.txt
on terminal. - Run
docker-compose up -d
on terminal (wait until Postgres database populate tables. Rundocker logs code-challenge_pg_db_1
and check if the last line is something like this:
LOG: database system is ready to accept connections - After both containers are up, run
python pipeline.py
Result
Here is a screenshot of the full pipeline being executed:


Project information
- Category: Data Analysis
- Project date: 13 July, 2022
- Project files: On Github