.. _lab-bpf-en:

===============================================
Class 6: BPF
===============================================

Date: 01.04.2025
:ref:`small_task_5_en`

.. notes::
    Od Andrzeja::

     = Wstęp
     - Ostatni dzień na wysłanie pierwszego dużego zadania w terminie
     - Dziś prezentacja drugiego zadania

     = BPF
     - Omówienie materiałów do laboratoriów
     - Warto pokazać też stronę BPFu (https://ebpf.io/) i fragmenty slajdów z LISA21 (https://www.brendangregg.com/Slides/LISA2021_BPF_Internals.pdf)
     - Warto zareklamować (darmową) książkę "Learning eBPF" autorstwa Liz Rice (https://isovalent.com/books/learning-ebpf/)

     = Małe zadanie 4 (też z BPF)
     Omów krótko treść, zachęć do zrobienia na rozgrzewkę przed drugim dużym zadaniem.

     = Omówienie Dużego Zadania 2
     Omówienie ogólne treści.
     Uczulenie na konieczność użycia specyficznego pliku .config. Warto zacząć od kompilacji z tym configiem "na sucho".
     Test używający bpf_simple nie wymaga implementacji nowych funkcji, warto od niego zaczać zacznijcie od tego.

     Warto omówić /boot/vmlinuz, vmlinux-extracts, symbole w system.map, to wszystko przyda się przy debugowaniu.
     Pokaż plik bpf.h, bpf_prog_typem, bpf_attach_type
     Pokaż bpftool, w szczególności: sudo bpftool btf dump file /sys/kernel/btf/vmlinux format c > vmlinux.h
     Omów krótko bpf/cgroup.c, warto się na tym trochę wzorować, w szczególności cgroup_current_func_proto
     Omów krótko bpf/helpers.c (w szczególności bpf_func_proto), tutaj będzie znaczna część rozwiązania.

.. tip::
    Useful links:

    - `Introduction to eBPF <https://ebpf.io/what-is-ebpf/>`_
    - `Kernel BPF documentation <https://docs.kernel.org/bpf/index.html>`_
    - `eBPF on Linux <https://docs.ebpf.io/linux/>`_ -- a bit nicer than the above
    - `BPF and XDP Reference Guide <https://docs.cilium.io/en/stable/reference-guides/bpf/index.html>`_ with technical details


.. admonition:: Hands-on

    For the labs today, you will need to download a prebuild kernel image (unless you have an image for the BPF large task already working)::

        https://students.mimuw.edu.pl/ZSO/PUBLIC-SO/vmlinuz-6.12.6zsobpf
        https://students.mimuw.edu.pl/ZSO/PUBLIC-SO/initrd.img-6.12.6zsobpf


    Boot the QEMU image by using these options apart from your usual ones::

        -kernel vmlinuz-6.12.6zsobpf -initrd initrd.img-6.12.6zsobpf -append "root=/dev/sda3"

    Then, install these dependencies on QEMU::

        apt install clang clang-14 llvm pahole bpftool bpftrace  bpfcc-tools libbpfcc libbpfcc-dev libbpf-dev

    Use the superuser account for all the commands today.

Introduction
============

BPF (Berkeley Packet Filter) is a technology that allows user-space processes to supply filtering programs.
In short, BPF enables writing small programs (which are not kernel modules) that execute in kernel mode.
A simple example of a BPF program (available in ``man 2 bpf``) is a filter that counts TCP and UDP packets received by the operating system.

BPF has many practical applications, including security, tracing and profiling processes, managing network interfaces, and system monitoring [1].
This technology is gaining popularity; for instance, in 2019, Netflix used 15 BPF programs by default, while Facebook used 40 in production [3],
and in recent years the technology has been developing rapidly - the ``linux/kernel/bpf`` directory was modified by almost 400 commits in 2021.

One unquestionable advantage of BPF is its high efficiency, allowing execution of relatively simple programs for every packet at 10Gb/s speeds
without noticeable delays [5]. However, BPF programs are not necessarily faster than their in-kernel equivalents [6];
their main feature is enabling execution of user-supplied code in kernel mode. Since this can obviously pose security risks,
BPF programs run in a sandbox environment after being verified, as we will discuss shortly.

Regarding terminology, the BPF abbreviation originates from the 1992 publication "The BSD Packet Filter" [7]. Linux 3.18 introduced extended
BPF (eBPF) with e.g., 64-bit registers support, and the older version started being referred to as cBPF (classic BPF). Nowadays, the technology
is generally referred to as BPF, though the term eBPF still can be encountered [2].

Types of BPF Programs
=====================

BPF programs can be of `various types <https://docs.ebpf.io/linux/program-type/>`_, specified in ``enum bpf_prog_type`` in ``include/uapi/linux/bpf.h``.

Kernel version 6.12 contains over 30 types, of which some of the more important are:

- ``BPF_PROG_TYPE_SOCKET_FILTER`` for dropping or modifying packets
- ``BPF_PROG_TYPE_KPROBE`` for function instrumentation
- ``BPF_PROG_TYPE_XDP`` to decide the fate of packets early in their processing
  (before performing costly operations), which is useful for DDoS protection
- ``BPF_PROG_TYPE_CGROUP_*`` for additional cgroup permission management

.. notes::
     Pokaż plik bpf.h a w nim bpf_prog_typem, bpf_attach_type

Creating BPF Programs
=====================

BPF programs resemble assembly but have their own register set and instruction set.
They provide 11 registers to the programmer: R0-R9 (read/write) and R10 (read-only stack frame pointer, similar to RBP in x86_64).
Registers are modified by numerous instructions [8], which allow for:

- arithmetic operations (e.g. ``BPF_ADD``, ``BPF_MUL``),
- jumps and function calls (e.g. ``BPF_JEQ``, ``BPF_JLE``, ``BPF_CALL``),
- loading and storing values (e.g. ``BPF_LD``, ``BPF_ST``).

One of the ways to write a BPF program is to manually use the ``struct bpf_insn`` (like in ``samples/bpf/bpf_insn.h``).
However, this has obvious disadvantages (just like writing programs in assembly),
so there are tools available for writing BPF programs in programming languages such as C, C++, Python, or Go, including bcc and libbpf.

Since BPF programs are executed in a sandbox, they cannot (`typically <https://docs.kernel.org/bpf/kfuncs.html>`_) call arbitrary kernel
functions and have limited options for interaction with the outside world.
Instead, they use `helper functions <https://man7.org/linux/man-pages/man7/bpf-helpers.7.html>`_, whose capabilities include:

- simple printing (``bpf_trace_printk``),
- retrieving context information (e.g. `bpf_get_current_uid_gid <https://elixir.bootlin.com/linux/v6.12.6/source/kernel/bpf/helpers.c#L222>`_),
- communicating with user space through various types of associative arrays (``bpf_map_*``),
- performing operations specific to the program type (e.g., dropping a packet),
- invoking other BPF programs (``bpf_tail_call``).

A prepared BPF program is verified by the kernel and then compiled using JIT (just-in-time compilation) into machine code.
The code responsible for compilation for the x86 architecture is located in the file ``arch/x86/net/bpf_jit_comp.c``.
After compilation, the BPF program can be executed.
The BPF program is typically accompanied by a user-space program that mediates communication with it.

For the documentation of helper functions,
see the `eBPF Docs <https://docs.ebpf.io/linux/helper-function/>`_ and `manpage <https://man7.org/linux/man-pages/man7/bpf-helpers.7.html>`_

Kernel Tracing
==============

Linux has multiple facilities for tracing and observability what is happening in it.
The most important parts pieces include `tracepoints <https://docs.kernel.org/trace/tracepoints.html>`_,
`ftrace <https://docs.kernel.org/trace/ftrace.html>`_,
`Kprobes <https://docs.kernel.org/trace/kprobetrace.html>`_.
In short, they allow hooking (*placing probes*) at various places.
Of special interest are **dynamic** traces, which allow hooking at runtime with virtually no overhead otherwise.

The idea of  `tracepoints <https://docs.kernel.org/trace/tracepoints.html>`_ is straightforward:
we explicitly place code checking if a probe is connected, and if so, call it with some arguments.
Function tracing with *ftrace* is a bit trickier, as we need help from the compiler to put a stub call at each function entry.
With `dynamic ftrace <https://docs.kernel.org/trace/ftrace.html#dynamic-ftrace>`_ on x86, you can notice a call to ``__fentry__`` at almost every function.
(Check it yourself with ``objdump --disassemble=vfs_write vmlinux | less``)
The function entry hook is also used to place a function exit hook: we just need to replace the return pointer on the stack
with a pointer to a specially crafted trampoline.
As an extra optimization, the kernel will self-modify and replace these calls with NOPs until they are needed.

Kprobes are more powerful, as they allow hooking at individual instructions.
In principle, it works by replacing the instruction in question with a breakpoint instruction to redirect the execution flow,
then execute the instruction there along with registered probes, and return to the main flow.

.. notes::
    Show an entrypoint of a function when compiled with FUNCTION_TRACER/DYNAMIC_FTRACE:

    ``objdump --disassemble=vfs_write vmlinux | less``

Hands-on
========

.. admonition:: Hands-on

    First, check if you have enabled necessary kernel features with `bpftool <https://www.mankier.com/package/bpftool>`_::

        bpftool feature probe

    ``bpftool`` is developed alongside ``libbpf`` in the main kernel tree.

bpftrace
--------

``bpftrace`` is a tool enabling quick hacking a prototyping around BPF probe facilities.

.. admonition:: Hands-on

    You may check a list of all available probes with::

        bpftrace -l

    Go ahead and run your first BPF program with something like::

        bpftrace -e 'kprobe:do_nanosleep { printf("PID %d sleeping...\n", pid); }'

    Then execute ``sleep`` in another terminal.
    Keep the trace running, as we will examine it with bpftool::

        bpftool prog list

    will `list currently <https://www.mankier.com/8/bpftool-prog>`_ installed BPF programs.
    You can see the BPF instructions (after initial translation by the kernel) with::

        bpftool prog dump xlated id <id>
        # or, in this specific case, just:
        bpftool prog dump xlated name do_nanosleep

    You cen see what maps are being used with `bpftool map <https://www.mankier.com/8/bpftool-map>`_::

        bpftool map

    In this case, bpftrace uses ``perf_event_array`` to implement its ``printf``.
    You may read these events with::

        bpftool map event_pipe id <id>

libbpf
------

.. important::
    When building libbpf out-of-tree, you will need to provide it with information about non-stable functions/structures
    (such as when you modify the BPF facilities).
    You may extract these with::

        sudo bpftool btf dump file /sys/kernel/btf/vmlinux format c > vmlinux.h

.. admonition:: Hands-on

    Let's rewrite our probe with C:

    .. code:: c

        #define BPF_NO_GLOBAL_DATA
        #include <linux/bpf.h>
        #include <bpf/bpf_helpers.h>
        #include <bpf/bpf_tracing.h>


        SEC("kprobe/do_nanosleep")
        int handle(void *ctx)
        {
            int pid = bpf_get_current_pid_tgid() >> 32;
            bpf_printk("PID %d is sleeping", pid);

            return 0;
        }

        char LICENSE[] SEC("license") = "GPL";

    And compile it with::

        clang --target=bpf -g -Og -c  example.bpf.c -o example.bpf.o

    You may use ``llvm-readelf`` and ``llvm-objdump`` to inspect that file.

    If you have a modern version of bpftool (e.g., compiled in ``linux-6.12.6/tools/bpf/bpftool`` with ``make``),
    you can just run::

        bpftool prog load example.bpf.o /sys/fs/bpf/example autoattach

    If your version does not support the 'autoattach' option yet, you will have to use libbpf for loading the program.
    The simplest way is to generate a skeleton file like::

        bpftool gen skeleton  example.bpf.o name example > example.skel.h

    And write a loader file like:

    .. code:: c

        #include <unistd.h>
        #include "example.skel.h"

        int main()
        {
              struct example *skel;
              int err = 0;

              skel = example__open();
              if (!skel)
                      goto cleanup;

              err = example__load(skel);
              if (err)
                      goto cleanup;

              err = example__attach(skel);
              if (err)
                      goto cleanup;

              pause();

        cleanup:
              example__destroy(skel);
              return err;
        }

    Which may be compiled and executed with::

        gcc example.user.c -o example.user -lbpf
        ./example.user


    In either way, open the trace printk log with::

        bpftool prog tracelog

    And execute a sleep program in another terminal.

BPF Programs from the Kernel Perspective
========================================

The basic path for running a BPF program starts with the use of the ``bpf_prog_load`` function,
which receives the BPF program type along with a list of instructions.
This function performs verification and loading of the program, and then returns a file descriptor associated with the program.
This file descriptor can then be used, for example, by passing it as an argument to the ``setsockopt`` function or ioctl with the request ``PERF_EVENT_IOC_SET_BPF``.

BPR Program Verification
------------------------

Because the BPF program is delivered from user space and executed in kernel mode, additional verification of the program's correctness is needed
to prevent unauthorized memory accesses as well as accidental errors that could crash the entire system.
The verification process consists of two stages.

Aspects verified in the first stage include:

- User permissions (by default, only users with ``CAP_BPF`` can load programs).
- Program size (maximum of ``BPF_MAXINSNS`` instructions, which is 4096 in our version).
- Presence of loops. Since kernel version 5.3, limited loops (bounded loops) are allowed, for which the halting property can be easily proven.
- Function calls. Generally, you cannot call functions that do not belong to the group of BPF helpers.
- Reachability of all instructions.

The second stage is more complicated.
The verifier starts from the first instruction of the program and tries to explore all possible execution paths,
while verifying its state, the contents of registers, and the operations performed on them.
To verify the state, the ``bpf_reg_state`` structure available in ``include/linux/bpf_verifier.h`` is used,
which stores, among other things, the types of values (``bpf_reg_type`` in ``include/linux/bpf.h``).
A value can have the type ``NOT_INIT``, ``SCALAR_VALUE``, or one of the pointer types (e.g. ``PTR_TO_CTX``, ``PTR_TO_STACK``, ``PTR_TO_PACKET``).
Pointer operations can change their type, e.g., adding two ``PTR_TO_CTX`` results in ``SCALAR_VALUE``
and from that moment on we can no longer access memory from this value (this could allow unauthorized access to memory).

Examples
--------

The implementation of helper functions can be found in ``bpf/helpers.c``.
When implementing a new type of program, it is advisable to model it after other relatively simple ones, such as ``bpf/cgroup.c``.

.. _small_task_5_en:

Small Task #5
=============

Implement a program ``show_bt`` that displays the backtrace for function calls in the kernel code made in the last 5 seconds.
For example, a call ``./show_bt vfs_write`` during which data is written to a file (by another process) should display on stdout the backtrace (of the code executed in kernel mode) for that execution.
If during the execution of the ``show_bt`` program a function in the kernel code is executed multiple times and these calls generate different backtraces, each of them should be printed.

With your solution include information about which functions the program does not work for and explain why.

Hint: you can use ``bcc``; the function ``attach_kprobe`` may be particularly useful.

Preparing for the Large Assignment
==================================

Build your own kernel image that will be able to run examples provided today.
You may start from the config provided for the :ref:`z2-ebpf` or the one used to build this image :download:`config-6.12.6zsobpf`.

If you want to start from your config you need to enable several flags in various places.
In ``menuconfig`` visit and enable at least:

- 'General setup' → 'BPF subsystem': ``CONFIG_BPF=y``, ``CONFIG_BPF_SYSCALL=y``, ``CONFIG_BPF_JIT=y``, and ``CONFIG_BPF_EVENTS=y``
- 'Kernel hacking' → 'Tracers': ``CONFIG_DYNAMIC_EVENTS=y``, ``CONFIG_KPROBES``, ``CONFIG_FUNCTION_TRACER``, ``CONFIG_DYNAMIC_FTRACE``, ``CONFIG_FPROBE``, ``CONFIG_FTRACE_SYSCALLS``, ``CONFIG_FPROBE_EVENTS``, ``CONFIG_KPROBE_EVENTS`` for examples on this lab (kprobes)
- 'General setup': ``CONFIG_IKHEADERS=y`` for ``bcc``
- Under 'Kernel hacking': ``CONFIG_DEBUG_KERNEL`` + ``CONFIG_DEBUG_INFO_BTF`` -- enabling these will likely more than 1GB RAM during build

You may attempt to build the examples located at `samples/bpf <https://elixir.bootlin.com/linux/v6.12.6/source/samples/bpf/README.rst>`_,
however, this will most likely fail unless you use their reference config.
You may find the `cilium guide <https://docs.cilium.io/en/stable/reference-guides/bpf/toolchain/>`_ useful here.

Readings and Extra Learning
===========================

There is a nice free book by Liz Rice available here: https://isovalent.com/books/learning-ebpf/

There is also a modern tutorial available here: https://github.com/eunomia-bpf/bpf-developer-tutorial

References
==========

- [1] https://ebpf.io/
- [2] https://www.brendangregg.com/blog/2021-06-15/bpf-internals.html
- [3] https://www.brendangregg.com/blog/2019-12-02/bpf-a-new-type-of-software.html
- [4] https://www.brendangregg.com/bpf-performance-tools-book.html
- [5] https://kinvolk.io/blog/2020/09/performance-benchmark-analysis-of-egress-filtering-on-linux/
- [6] https://pchaigno.github.io/ebpf/2020/09/29/bpf-isnt-just-about-speed.html
- [7] https://www.usenix.org/legacy/publications/library/proceedings/sd93/mccanne.pdf
- [8] https://www.kernel.org/doc/html/latest/bpf/instruction-set.html
- [9] https://github.com/iovisor/bcc/