Seminarium: Systemy Rozproszone
20 listopada 2025 12:15, sala 4070
Ignacy Gębuś, Maksym Matviienko



Communication-Aware ML Training



In my talk I want to introduce the key communication challenges in distributed training of modern ML models, which increasingly limit scalability in large GPU clusters. In this context, I will present and discuss Cassini, a network-aware scheduler designed to mitigate communication bottlenecks through topology-conscious placement and timing strategies. I will also outline potential master's thesis topic related to analyzing communication penalty functions for modeling such ML workloads.

Zapraszam,
Ignacy Gębuś



Bibliografia:





SIEVE: A Simple and Efficient Cache Eviction Algorithm



Cache eviction algorithms have grown increasingly complex in pursuit of better efficiency, yet none have achieved widespread adoption in production systems. SIEVE is a cache eviction algorithm that is simpler than LRU while achieving better efficiency than state-of-the-art algorithms and superior scalability. In this presentation, I will explain how SIEVE works and discuss the evaluation results demonstrating its advantages over existing approaches.

Zapraszam,
Maksym Matviienko



Bibliografia: