Seminarium: Systemy Rozproszone
8 grudnia 2022 12:15, sala 4070, transmisja online
Apache Airflow is a robust scheduler for programmatically authoring, scheduling, and monitoring workflows. It’s designed to orchestrate complex data pipelines. It was initially developed to tackle problems that correspond with long-term cron tasks and substantial scripts, but it has grown to be one of the most powerful data pipeline platforms on the market (there are managed offerings from AWS and GCP for Airflow).
On the meeting, I'll introduce basic concepts used in most workflow management systems such as schedules, DAGs, pipelines, tasks, orchestration, monitoring, remote execution, queues and workers. Particularly, we will take a birds-eye view at Apache Airflow, which is widely used for portable and modular programming in such use cases.
Zapraszam,
Filip Mikina
Bibliografia: