Seminarium: Systemy Rozproszone
7 grudnia 2021 12:15, sala 4070
Kacper Chętkowski, Tomasz Nowak

HadaFS: A File System Bridging the Local and Shared Burst Buffer for Exascale Supercomputers

Aktualne superkomputery wprowadzają specjalną warstwę Burst Bufer (BB), złożoną z dysków SSD, aby sprostać rosnącym wymaganiom I/O aplikacji HPC. BB można podzielić na dwa rodzaje ze względu na jego lokalizację. Pierwszym z nich jest lokalny BB, znany z możliwości skalowalności i wydajności. Drugim jest współdzielony BB, który ma zaletę współdzielenia danych i niższych kosztów utrzymania. Jak połączyć zalety lokalnego BB i wspólnego BB stanowi kluczowy problem w społeczności HPC.

Autorzy pracy przedstawiając nowy system plików BB o nazwie HadaFS, który łączy zalety lokalnej i współdzielonej warstwy BB. Po pierwsze, HadaFS oferuje nową architekturę Localized Triage Architecture (LTA), aby rozwiązać problem ultra-skalowalnej rozbudowy i współdzielenia danych. Następnie, HadaFS proponuje indeksowanie pełnej ścieżki za pomocą trzech różnych strategi synchronizacji metadanych, aby rozwiązać problem złożonego zarządzania metadanymi tradycyjnych systemów plików i niezgodności z zachowaniami operacji I/O aplikacji. Ponadto, HadaFS integruje narzędzie zarządzania danymi o nazwie Hadash, które wspiera wydajne zapytania o dane w BB i przyspiesza migrację danych między BB a tradycyjnym magazynem HPC. HadaFS został wdrożony na superkomputerze Sunway New-generation Supercomputer (SNS), obsługując setki aplikacji i wspierając aż do 600000 klientów.

https://www.usenix.org/conference/fast23/presentation/he

https://www.mcs.anl.gov/papers/P2070-0312.pdf

Autorzy: Xiaobin He Bin Yang Jie Gao Wei Xiao Qi Chen Shupeng Shi Dexun Chen Weiguo Liu Wei Xue Zuo-ning Chen

Zapraszam,
Kacper Chętkowski

Implementacja wsparcia dla języka OCaml w debuggerach

Będę opowiadał o swojej pracy magisterskiej, o dziwo z kompilatorów, a nie systemów rozproszonych. Projekt jest już skończony, jestem na etapie pisania PDFa. Poniżej umieszczam szczegółowy opis projektu:

Up until now, there was no convenient method for debugging programs written in the OCaml programming language. This thesis presents work done with collaboration with Jane Street to add support for the OCaml language in debuggers such as GDB and LLDB. Now, OCaml programmers can easily print the contents of any variable type in LLDB after running the executable under the debugger.

Before the work presented in this thesis, there were two usual methods of debugging OCaml programs, both with significant drawbacks. The first, more popular method, was to first define a custom printing function for a given type and then to print the content of variables in the code. This often required adding a significant amount of own code and a frequent recompilation of the program. The second method was to use the ocamldebug debugger for OCaml bytecode. Unfortunately, not all code can be compiled to bytecode (e.g. because of some C stubs), and additionally this method is not usable for debugging apps that are actually running, as production apps are usually compiled to native code due to performance requirements.

Before my work, it was already possible to print 63-bit integers in the debuggers. During runtime there was already enough debug information in the executables to recognize the memory locations of a subset of variables in a source code, like function parameters. Nevertheless, implementing a full support for OCaml types required many steps.

Even though OCaml is a strongly typed language, almost all type information is stripped during compilation. The first step was to pass type information from the front-end of the OCaml compiler all the way to the back-end.

This allowed to implement in the back-end the transformation of the types into an universal format defined by the DWARF specification that can be read by portable debuggers. Because OCaml variables can be polymorphic and the specification doesn't support them, all polymorphic types have to be fully deduced at the end. Handling edge-cases in the transformation turned out to be challenging.

The produced debug information could already be read by debuggers like GDB and LLDB, but it didn't mean that those debuggers fully supported all needed functionality. The biggest issue turned out to be variant types -- based on a discriminant there can be different data attached to a variable, not necessarily all of the same size. The GDB supports variant types, but only of fixed size (like union types in imperative languages). The LLDB didn't support variant types at all.

Because Jane Street decided to use LLDB, I have implemeneted the support for variant types. The method turned out to be quite generic, thus it is extendable for any language. Of course, I also had to add OCaml-specific code in the LLDB debugger, like formatting the content of variables and handling OCaml-specific edge-cases related to memory representation of variables.

This has been the first attempt at constructing DWARF information for the OCaml language and it has turned out to be successful. At the moment of writing this thesis, Jane Street plans to include my work into all their programmers' toolchain and construct debug information by default. After all, this approach doesn't have any drawbacks beside an increase in binary size. Additionally, both the changes in the LLDB concerning general variant type support and OCaml type support might be upstreamed into the LLDB itself.

Zapraszam,
Tomasz Nowak