.. _z2-ebpf: ========================================== Assignment 2: BPF compressibility analyzer ========================================== Announcement date: 25.03.2025 Due date: 06.05.2025 (final due date 20.05.2025) .. toctree:: :hidden: Additional materials ==================== - ``.config`` file: :download:`CONFIG_Z2` - tests: :download:`z2-tst-public.tar.xz` Introduction ============ Data compression allows reducing the size of stored data but requires additional CPU and memory usage. Moreover, some data compress better than others, and in extreme cases, compressed data may occupy more space than before compression. As a result, despite the implementation of convenient compression in some file systems (e.g., F2FS, btrfs, ZFS), compression is often not used. One of the most important features of BPF technology is facilitating monitoring and observation of what happens in the operating system. A valuable piece of information that could be obtained by extending the existing BPF implementation in the Linux kernel is the compressibility of data written to the system. The goal of this assignment is to enable the operating system to collect such information. Assignment ========== Implement a system for analyzing data compressibility during write operations. The system should allow using the LZO compression implementation available in the Linux kernel to verify compressibility but also enable writing a custom compression algorithm as a BPF program or combining both methods. To this end, new system calls should be introduced, as well as a way to hook a mechanism for compression verification into the kernel - a pair of BPF programs that will be executed on the following system calls respectively: #. ``open``, ``openat``, ``openat2``, ``creat`` - to determine whether compression verification should be performed on subsequent write operations to the file #. ``write``, ``pwrite``, ``writev``, ``pwritev2`` - for files that require verification, these calls should check the compression and store the analysis results A BPF program installed like this should have read access to the ``compression_ctx`` structure described below, as well as information about the file and the currently written data. The result of the verification, i.e., the size of the data before and after compression, should be recorded. To access or reset the recorded results, two new system calls should be implemented: .. code:: c int get_compression_stats(int fd, size_t __user * bytes_written, size_t __user * bytes_after_compression) int reset_compression_stats(int fd) Technical details ================= - Add a new BPF program type: ``BPF_PROG_TYPE_COMPRESSOR`` (with a number one higher than ``BPF_PROG_TYPE_NETFILTER``). - Add a new BPF attach type: ``BPF_COMPRESSOR`` (with a number one higher than ``BPF_TRACE_KPROBE_SESSION``). The data passed in the BPF call context should have the following format: .. code:: c struct compress_ctx { union { struct { loff_t offset; size_t size; }; struct { kuid_t uid; kgid_t gid; }; }; }; Two new BPF hook points should be added: - ``int bpf_compressor_decide(struct compress_ctx *)`` - if it returns a positive number, this indicates that the file should undergo compressibility verification. - ``int bpf_compressor_verify(struct compress_ctx *)`` - performs compressibility verification; in case of an error, the syscall being executed should be aborted with ``-EINVAL`` The newly added BPF program type should be attachable to these hooks (and only them) via ``BPF_RAW_TRACEPOINT_OPEN``. There is no need to support ``mmap``, ``vmsplice``, ``splice``, ``tee``, or ``sendfile`` in this assignment. New functions should be added and made available for the new BPF program type (and only it) to allow the BPF program to perform compressibility verification: - ``int bpf_copy_from_buffer(void *ctx, unsigned long offset, void *ptr, unsigned long size)`` - copies data from the buffer in the context to the memory pointed to by ``ptr``. - ``int bpf_lzo_compress(void *ctx)`` - compresses the data from the buffer in the context and returns the size of the compressed data or a negative number in case of an error. The new program type should have access to the following functions (and no others): - the base set of common BPF functions - the above new functions - the existing functions ``get_current_uid_gid`` and ``get_current_pid_tgid`` Compiling the solution ====================== Running the solution requires a properly configured Linux kernel with settings such as ``CONFIG_DEBUG_INFO_BTF=y``. These settings were intentionally omitted from the configuration provided in previous materials because they increase the compilation time and resource requirements. Due to this, it may be necessary to expand the virtual machine image by a few gigabytes (the ``qemu-img resize`` command might be helpful). The easiest way to test the solution is to use the configuration file provided in the additional materials section. Compiling the tests requires the ``vmlinux.h`` file, which was not included with the tests. It can be generated e.g. using the ``bpftool`` program. No additional modifications to the provided ``libbpf`` library are required. To compile the solution and tests inside the virtual machine, you must also install the ``dwarves`` and ``clang`` packages. Hints ===== A fix for infinite loops that was recently implemented in the BPF subsystem (https://lore.kernel.org/bpf/20241015150207.70264-2-leon.hwang@linux.dev/T/) unelegantly introduces an assumption that ``bpf_tramp_prog_type == BPF_TRAMP_REPLACE`` implies ``tgt_prog != NULL``. The solution must take this into account, otherwise the kernel will panic due to NULL dereferences. During testing, an error introduced by the student may prevent the virtual machine from booting. If this happens, it is recommended to use the ``-kernel``, ``-hda``, and ``-append "root=/dev/sda3"`` options with previously copied working ``vmlinuz`` and ``initrd`` files (https://qemu-project.gitlab.io/qemu/system/linuxboot.html). If you did not prepare these files beforehand, you can extract them from the base image used in the labs. To compile the kernel using multiple CPU cores, remember to use the ``-j`` option in the ``make`` command and ensure that the virtual machine can utilize multiple cores (e.g., via the ``-smp`` option). Solution format =============== As the solution submit a package containing: - a patch for the Linux kernel version 6.12.6 generated using ``git format-patch`` - a short description of the solution