Architecture of TRACKTICS data analysis pipeline

Data analysis pipeline

Architecture of TRACKTICS data analysis pipeline

Data analysis pipeline

The TRACKTICS Pipeline is a framework that allows to orchestrate parallel DAG (Direct Acyclic Graphs) batch processing jobs. It is based on Docker for artifact deployment, ECS (Amazon EC2 Container Service) for container orchestration and resource discovery, Luigi (by Spotify) for workflow and dependency management, SQS (Amazon Simple Queuing System) as a job queue, a custom service for auto scaling, S3 and Postgres (Amazon RDS) to store results and Kinesis for logging. On a service level our toolbox features a universal Python wrapper that allows to expose the functionality of each service to ECS and the workflow manager, in a consistent way. A blog post about how we developed our pipeline is in preparation.

Avatar
Ludwig Auer
Computational Scientist