Seminarium: Systemy Rozproszone
22 maja 2025 12:15, sala 4070
Jakub Panasiuk, Michał Staniewski



SIMD-accelerated regex optimizations



Regular expressions are a powerful tool for pattern matching in text, but traditional implementations often struggle with performance, especially when processing large inputs or complex patterns. One promising approach to improving their efficiency is leveraging SIMD (Single Instruction, Multiple Data) – a CPU-level optimization that allows multiple data elements to be processed in parallel.

This work explores the integration of SIMD techniques with CTRE (Compile Time Regular Expressions), a modern C++ library by Hana Dusíková. CTRE compiles regular expressions at compile time using constexpr and template metaprogramming, eliminating runtime parsing overhead and enabling optimizations that are not possible in traditional engines. However, while CTRE provides a highly efficient foundation, it does not natively exploit SIMD parallelism.

The goal is to extend CTRE’s matching engine to take advantage of SIMD instructions (such as SSE and AVX) to accelerate pattern matching even further. This requires low-level modifications and a deep understanding of both the library's internals and the structure of typical regular expressions. The resulting implementation is evaluated in terms of performance improvements across a range of input types and pattern complexities.

Zapraszam,
Jakub Panasiuk





Asynchronous dynamic memory allocation on Intel GPUs



Asynchronous operations are crucial in GPU programming as they allow tasks to overlap, maximizing resource utilization. Within Intel oneAPI framework, asynchronous capabilities can be further extended by introducing asynchronous memory allocation, which reduces the need for synchronization barriers, enhances memory reuse and improves overall performance.

This work addresses the specific challenges of GPU memory allocators, proposes an efficient implementation of asynchronous dynamic memory allocation for the Unified Runtime library and evaluates this solution on Intel GPUs, with a focus on reducing overhead and enhancing performance in parallel applications.

Zapraszam,
Michał Staniewski