Class 12: Kernel Debugging¶

Date: 20.05.2025

Resources¶

kernel config with various debugging options enabled: config
a qemu image with a compiled kernel: https://students.mimuw.edu.pl/ZSO/PUBLIC-SO/debugging/zso2025_debug.qcow2.gz (available on students at /home/students/inf/PUBLIC/ZSO/debugging/zso2025_debug.qcow2)
the build directory with the compiled kernel (for kgdb): https://students.mimuw.edu.pl/ZSO/PUBLIC-SO/debugging/linux-6.12.6.tar.gz (available on students at /home/students/inf/PUBLIC/ZSO/debugging/linux-6.12.6)
a working crash utility: https://students.mimuw.edu.pl/ZSO/PUBLIC-SO/debugging/crash.gz (available on students at /home/students/inf/PUBLIC/ZSO/debugging/crash, also included in the qemu image)

The qemu image is incremental, and its backing file is the image from QEMU. If you want to use the image on your own machine, you can change the backing file path by running qemu-img rebase -u -b <your_backing_file_path> zso2025_debug.qcow2, where <your_backing_file_path> is a path to your identical copy of the image from QEMU.

Additional materials¶

Debugging by Printing¶

Using printk() for debugging kernel space code is analogous to using printf() for debugging user space code. It is very easy to use without special setup (requires CONFIG_PRINTK, which is usually enabled), making it especially good for quick checks.

printk() writes to a circular buffer - when the buffer fills up, new messages will overwrite the oldest ones. The size of the buffer is specified in CONFIG_LOG_BUF_SHIFT (max 32 MiB) and can be overriden using the log_buf_len kernel boot parameter (max 2 GiB). The size is always a power of 2. The buffer can be read using the dmesg command.

Instead of using printk() directly, one can use several convenience macros for each log level, such as pr_info(), pr_err(), etc. These macros differ from printk() in that they first apply the pr_fmt() macro to the format string. By default, pr_fmt() does nothing, but one can define it to, for example, add a custom header to each message. Note that pr_fmt() must be defined before printk.h gets included. Additionally, pr_debug() and pr_devel() are conditionally compiled: only if DEBUG is defined.

Additionally, there is printk_once(), which prints a message only once, and printk_ratelimited(), which prints messages at a limited rate. Both of the above macros also have corresponding pr_*() convenience macros. Note that the files /proc/sys/kernel/printk_ratelimit and /proc/sys/kernel/printk_ratelimit_burst, control only the printk_ratelimit() function (not printk_ratelimited()), whose all call sites share the limiting state. All other rate-limited print functions have their parameters hardcoded in the kernel.

Lastly, all the aforementioned macros have dev_* versions (e.g. dev_printk, dev_info()) that should be used in device drivers and contain additional information about the device. Other useful macros are print_hex_dump(), print_hex_dump_debug() and print_hex_dump_bytes().

The kernel's log level can be controlled either via /proc/sys/kernel/printk or using the boot parameters loglevel or ignore_loglevel (see https://www.kernel.org/doc/html/latest/core-api/printk-basics.html).

References:

Dynamic Debug¶

If CONFIG_DYNAMIC_DEBUG is set, the functions pr_debug(), dev_dbg(), print_hex_dump_debug() and print_hex_dump_bytes() use dynamic debugging: instead of using DEBUG to enable them at compile time, they can be enabled dynamically. For instructions on how to control dynamic debugging, see https://www.kernel.org/doc/html/latest/admin-guide/dynamic-debug-howto.html (especially the Examples section can be helpful; ddcmd is an alias defined earlier in the document).

Hands-on

set log_buf_len to 64 MiB
add a custom header to messages printed in the hello device from Class 9: Character Devices
add a rate-limited warning message to hello_open()
add an info message to hello_release() that gets printed only once
in hello_read(), print a hexdump of the read bytes
add a dynamic debug message to hello_ioctl()
verify that your changes work correctly

Kernel debuggers¶

The kernel has two debugger frontends: kdb and kgdb. kgdb is much more powerful - it allows you to use gdb with additional scripts for inspecting the kernel state. kdb currently allows for setting breakpoints and single-stepping, and has some of the kernel inspection capabilities of kgdb. It no longer has the option to display code disassembly. One advantage of kdb, though, is that only kdb can be run on the machine being debugged (i.e. without a second machine).

If you want to use kgdb, the following kernel options are necessary or recommended: CONFIG_KGDB=y, CONFIG_KGDB_SERIAL_CONSOLE=y, CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y, CONFIG_GDB_SCRIPTS=y, CONFIG_DEBUG_INFO_REDUCED=n.

For kdb, the options are: CONFIG_KGDB=y, CONFIG_KGDB_KDB=y, CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y, CONFIG_KDB_KEYBOARD=y, CONFIG_KGDB_SERIAL_CONSOLE=y, CONFIG_DEBUG_INFO_REDUCED=n.

Hands-on

follow the guide to debug the kernel with kgdb: https://www.kernel.org/doc/html/latest/process/debugging/gdb-kernel-debugging.html
- any necessary kernel config options are already set in the provided image, but you need to set nokaslr
- run kgdb from the build directory provided in Resources
- try the commands from the guide
- to observe loading a module, you can use dummy, which is available in the provided kernel (modprobe dummy)
- note that the interface of lx_per_cpu() has changed slightly: unlike in the guide, the name of the per-cpu variable should be provided without quotes (and tab-completion works)
- play with other kgdb commands (run apropos lx for a list)
- use gdb functions (cheatsheets such as https://github.com/reveng007/GDB-Cheat-Sheet are useful)
  - set breakpoints and try to trigger them, e.g. set a breakpoint on do_open() and then read a file
  - examine the stack trace at a breakpoint, in particular the arguments and local variables of the different functions in the stack trace
  - examine the registers
  - examine the surrounding code in C and assembly
  - single-step through the function at source code and assembly level, enter some of the function calls. Try different layouts (asm, src, split)
  - run the function until return
  - set a watchpoint on a variable and trigger it
  - set a conditional breakpoint and trigger it
  - delete and disable some breakpoints
  - print an expression
  - print a type definition
debug the kernel with kdb
- to enter kdb, you need to configure its I/O and then trigger it with SysRq. You can do this by running echo kbd > /sys/module/kgdboc/parameters/kgdboc; echo g > /proc/sysrq-trigger
- see what you can do with kdb, in particular which of the previous steps you can repeat (run help for a command list)

References:

https://www.kernel.org/doc/html/latest/process/debugging/kgdb.html

Crash dumps¶

A bug may cause the kernel to crash or hang. In such a case, generating a crash dump that contains information about the kernel's state at the time of the crash can be helpful. Crash dumps can be generated using kdump, which is a mechanism that utilizes kexec to boot a second kernel that captures the crash dump in case of a crash.

There are several ways to force a kernel crash on purpose. The simplest one is by issuing the magic SysRq command c. To issue this command to the VM you cannot use the keyboard normally, since the command would be interpreted by the host; one way of issuing the command that works on the VM is by writing to /proc/sysrq-trigger.

Another way to simulate errors and crashes is by using the Linux Kernel Dump Test Module (LKDTM) (CONFIG_LKDTM). It can be controlled from DebugFS (see https://www.kernel.org/doc/html/latest/fault-injection/provoke-crashes.html).

Hands-on

enable crash dumps by installing kdump-tools via apt
- select No for kexec-tools handling reboots and Yes for enabling kdump-tools
- since the kernel with many debugging options enabled requires more memory, after installing you need to increase the crashkernel size by changing crashkernel=384M-:128M to crashkernel=384M-:256M in /etc/default/grub.d/kdump-tools.cfg and running sudo update-grub
- then reboot
force a crash using Magic SysRq
force some erros and crashes using the LKDTM. Experiment with different ones, such as WARNING, LOOP, PANIC, BUG, etc. (a full list can be found by reading the file /sys/kernel/debug/provoke-crash/DIRECT). Try different crash points, such as DIRECT (trigger immediately) and INT_HW_IRQ_EN (trigger on handle_irq_event()).
examine both crash dumps using the crash command
- the crash installed from the Debian repository seems to have some issue with gdb crashing, so instead use the provided /home/zso/crash
- pass the vmlinux from the kernel compilation directory to crash
- examine the dump using the different available commands (see man crash or help in the crash prompt). In particular check the backtrace, processes, machine information, the kernel log, registers and the failing code. Remember that you can run gdb commands inside crash (if a gdb command name conflicts with a crash command, run it as gdb <command>)

References:

Stack traces¶

If the kernel detects a bug and does not crash, it prints a stack trace. The stack trace contains information such as the function call trace, register values and loaded modules. Some scripts in the kernel source code help working with stack traces:

scripts/decodecode - disassembles the code bytes printed by kernel oopses
scripts/decode_stacktrace.sh - converts byte offsets in the function call trace to line numbers

Hands-on

use decode_stacktrace.sh (which also uses decodecode) to examine the stacktraces from the crashes you triggered (take the stacktraces from the dmesg dumps)

References:

https://www.kernel.org/doc/html/latest/admin-guide/bug-hunting.html

Runtime error checkers¶

The kernel has sever mechanisms for error checking at runtime, in particular:

KASAN (Kernel Address Sanitizer) - helps find use-after-free and out-of-bounds bugs (CONFIG_KASAN). Alternatively, you can use KFENCE (Kernel Electric-Fence), which has lower precision, but also a lower overhead that makes it suitable for production code.
UBSAN (Undefined Behavior Sanitizer) - detects undefined behavior, such as bit shift by a negative (CONFIG_UBSAN) value
lockdep (Lock Dependency Validator) - detect potential deadlocks and other locking-related issues (CONFIG_PROVE_LOCKING)
KCSAN (Kernel Concurrency Sanitizer) - detects data races (CONFIG_KCSAN)
Kmemleak (Kernel Memory Leak Detector) - detects memory leaks (by default scans the memory every 10 minutes) (CONFIG_DEBUG_KMEMLEAK)
KMSAN (Kernel Memory Sanitizer) - detects uses of uninitialized values (CONFIG_KMSAN)

Hands-on

introduce bugs into the hello module that will be detected by the above checkers and verify that they are detected (note that in the provided kernel KCSAN and KMSAN are disabled, since they are incompatible with KASAN)

DebugFS¶

DebugFS (CONFIG_DEBUG_FS) allows to easily expose kernel variables to user space for read or write access via files under /sys/kernel/debug. Any struct file_operations can be provided for these files, but there are also convenient helpers for creating files that access integer variables or (read-only) binary blobs and blocks of registers. See https://www.kernel.org/doc/html/latest/process/debugging/driver_development_debugging_guide.html#id9 and https://www.kernel.org/doc/html/latest/filesystems/debugfs.html for details.

Note that if you need to transfer large quantities of data from the kernel to user space, DebugFS can be used in conjunction with the relay interface to create a circular buffer that can be written to by the kernel and read in user space by reading a DebugFS file (see https://docs.kernel.org/filesystems/relay.html).

Hands-on

expose the hello_repeats variable from the hello module for read-write access via DebugFS
verify that the file works correctly

Fault injection¶

Some kernel functions support fault injection: they can be forced to return an error regardless of whether there was an actual error. The fault injection mechanism can be controlled via DebugFS and provides several parameters that specify when and how the injection should happen, for example depending on the stacktrace (see https://www.kernel.org/doc/html/latest/fault-injection/fault-injection.html).

The related kernel config options are: CONFIG_FAULT_INJECTION, CONFIG_FAULT_INJECTION_DEBUG_FS; configs for each failure type, e.g. CONFIG_FAILSLAB for slab allocation failures, CONFIG_FUNCTION_ERROR_INJECTION for injecting specific error return values.

Hands-on

inject a single user memory access failure with verbosity set to 2
examine the kernel log