Seminarium: Systemy Rozproszone
11 maja 2023 12:15, sala 4070
The recent progress in cloud technology and the emergence of new serverless offerings allow for the creation of real-time, cost-efficient applications that allocate resources dynamically and adapt to variable workloads. However, adapting a legacy application to a new environment may be challenging and may result in suboptimal architecture due to platform limitations and past architectural decisions. An example of such an application is one that heavily relies on a SQL database and transactions.
In this work, we show how to adapt an existing application - Apache Airflow, a prominent example of a workflow management system, which requires allocating multiple virtual machines to run - to a cloud-native serverless platform, sAirflow. We show how to fully migrate Airflow by providing an event-driven control plane on Function-as-a-Service (FaaS) and new types of serverless task executors which in turn allows for paying only for the utilized resources as opposed to keeping machines up and running at all times. As a result, Airflow’s core components run on FaaS. Additionally, Airflow is able to launch its tasks (user-defined work) on serverless offerings. FaaS providers typically limit the maximum runtime of invocations. For this case, we provide a way to execute the work on Container-as-a-Service platforms with unlimited runtime but with other disadvantages such as a longer start-up time. Our prototype runs on AWS using the provider's serverless offerings.
By a direct comparison to a managed version of Apache Airflow on AWS (Amazon MWAA) we show that the monetary cost can be reduced up to 50% while maintaining the overall performance. Additionally, we use real-world traces from Alibaba Cloud and micro-benchmarks to demonstrate that sAirflow introduces on average 10% more overhead on a task but it is up to 7 times faster when horizontally scaling for FaaS task execution.
Zapraszam,
Filip Mikina
Bibliografia: