Class 4: Kernel modules

Date: 20.03.2018

Additional materials

What is a module?

A module is relocatable code / data that can be inserted and removed from the kernel while the system is running. The module may refer to (exported) kernel symbols as if it was compiled as part of the kernel and may itself provide (export) symbols that other modules may use. The module is responsible for a certain service in the kernel – for example, modules can be device and file system drivers, network filters, cryptographic algorithms, etc.

The modules are compiled for a specific version and configuration of the kernel – the use of modules from other versions of the kernel (or the same version with significantly different configuration) will probably fail. The module loading system will try to detect and prevent such a situation.

Creating modules

The kernel modules (as well as the main kernel code) are written in C (using other languages is not possible). The environment inside the kernel, however, is quite specific and differs greatly from writing an ordinary program in the user space.

Kernel code is written according to the official coding style – https://www.kernel.org/doc/html/v4.15/process/coding-style.html .

Compilation of modules

To compile modules, you need a directory with the configured and compiled kernel source. In principle, only the header files and configuration are enough, but separating the appropriate files from the rest is a very complicated process and only Linux distributions with a large number of their own scripts are able to do so. The Kbuild system is responsible for compiling the modules (as well as the kernel itself), which is quite a complicated overlay on top of Makefile.

To compile our module, we need to create a Kbuild file describing our code, for example:

obj-m := module.o different_module.o

compiles the module.c file to the module.ko module, and the different_module.c file to the different_module.ko file.

If we want to combine several source files into one module, we can do it as follows:

obj-m := module.o
module-objs := module_p1.o module_p2.o

This Kbuild file will compile the module_p1.c and module_p2.c files and combine them into the module.ko module.

To call the compilation of the module, you should call make in the kernel source directory, pointing it to our directory with external modules:

make -C /usr/src/linux-<version> M=/home/<user>/my_modules

For simplicity, you can write your own Makefile that calls the appropriate command (see example).

Module metadata

Each module can (but does not have to) define metadata using the following macros (defined in linux/module.h):

MODULE_LICENSE("GPL");
MODULE_AUTHOR("Horse Fred");
MODULE_DESCRIPTION("Driver for my device");

Metadata defined in this way are stored (along with many other data) in the .modinfo section of the finished module and can be printed with the modinfo command.

Choosing a license has an important and unobvious effect – using a GPL-compatible license will allow us to use kernel symbols marked as available only for modules licensed under the GPL. The following are recognized as compatible licenses:

  • "GPL" – GNU Public License v2 or later,
  • "GPL v2" – GNU Public License v2,
  • "GPL and additional rights" – GNU Public License v2 + additional rights
  • "Dual BSD/GPL" – GNU Public License v2 or BSD license to choose from
  • "Dual MPL/GPL" – GNU Public License v2 or Mozilla to choose from
  • "Dual MIT/GPL" – GNU Public License v2 or MIT to choose from

Module constructor and destructor

Modules do not have a main function or their own process / thread (unless they create it themselves, but it is quite rare). Instead, the module’s code is called by various kernel subsystems when there is something to do for it.

Each module can define a function initiating the module (constructor) and releasing the module (destructor). These functions are defined as follows:

int my_init_function(void) {
    /* ... */
}
void my_cleanup_function(void) {
    /* ... */
}
module_init(my_init_function);
module_exit(my_cleanup_function);

The init function is called when the module is loaded. If everything went well, it should return 0. If it failed to initialize the module, it should return the error code (negated code from errno*.h) – the module will be immediately removed by the kernel.

The cleanup function is called when the module is removed (but is not called when the init function has returned an error).

The task of the init function is to register the functionality provided by the module into the kernel structures – for example, a PCI device driver will in this function inform the PCI subsystem of supported devices and functions that should be triggered when a matching device is detected. Without such registration, the kernel will never call the code of our module, so modules without an initializing function are basically only useful as code libraries for other modules.

The task of the cleanup function is to reverse everything that the init function has done and clean after all the module’s activity. If the module has an initiating function, always provide a cleanup function (even if it has to be empty) – otherwise, the kernel will reason that our module does not support removal and will not allow rmmod to be executed on it.

Sometimes you can fine older modules that use functions with the default names of init_module() and cleanup_module(), without declaring them with module_init() and module_exit(). This is not recommended in current kernel versions.

A module should have only one constructor and only one destructor.

printk

For the purposes of debugging and informing about important events, you can use the printk function, similar to printf

printk(KERN_WARNING "Failed, error code: %d\n", err);

Before the message, attach its priority (it should be noted that there is no comma), which may be (in ascending order):

  • KERN_DEBUG
  • KERN_INFO
  • KERN_NOTICE
  • KERN_WARNING
  • KERN_ERR
  • KERN_CRIT
  • KERN_ALERT
  • KERN_EMERG

The messages printed by printk will be available in the system log, which can be viewed using the dmesg command. If they have a high enough priority, they will also be immediately written to the console.

The first example module shows the use of printk as well as the constructor and the destructor.

Using external symbols

In modules, you can freely use symbols defined and exported by the main kernel code and by other modules (you can view them in the file /proc/kallsyms).

In order for a symbol of our module to be visible from the outside, it should be exported with the macro EXPORT_SYMBOL:

EXPORT_SYMBOL(my_function);

int my_function(int x) {
    ...
}

There is also a similar macro EXPORT_SYMBOL_GPL, exporting a symbol only for modules under the GPL (or compatible) license.

The depmod program automatically collects information on dependencies between modules resulting from the use of exported symbols and ensures that they are loaded in the correct order.

The second example module shows the export of symbols and the use of exported symbols.

Parameterization of modules

You can declare that a specified variable will contain a parameter that can be changed when the module is loaded. The name of the parameter is the same as the variable name.

When the module is being loaded, the values given by the user (if any) will be inserted in place of the given variables, eg:

insmod module.ko irq=5

stores the value of 5 into the irq variable.

To declare that a variable is to be used as a parameter of the module use the following macro:

module_param(variable, type, permissions);

Types can be: byte, short, ushort, int, uint, long, ulong, charp , bool, invbool. The charp type is used to pass strings (char *). The invbool type means a bool parameter, which is a negation of the value given by the user.

You can define your own parameter types – to do that, you must define the functions param_get_XXX, param_set_XXX and param_check_XXX.

permissions means the file permissions that will be given to the parameter in sysfs.

Each parameter should have a description. The description of the parameter can be read along with the description of the entire module using the modinfo program, thanks to which the module carries a description of its use. The description is given by the macro MODULE_PARM_DESC:

MODULE_PARM_DESC(variable, description);

Examples:

int irq = 7;
module_param(irq, int, 0);
MODULE_PARM_DESC(irq, "Irq used for device");

char *path="/sbin/modprobe";
module_param(path, charp, 0);
MODULE_PARM_DESC(path, "Path to modprobe");

Use:

printk(KERN_INFO "Using irq: %d", irq);
printk(KERN_INFO "Will use path: %s", path);

To declare an array of parameters you must use a different macro:

module_param_array(variable, type, pointer_to_count, permissions);

All fields except pointer_to_count have the same meaning as in module_param(). pointer_to_count contains a pointer to the variable to which the number of elements in the array will be written. If you are not interested in the number of arguments, you can specify NULL, but then you need to recognize whether the argument is present based on its contents, which is not recommended. The maximum number of array elements is determined by the array declaration, e.g. if we declare its size to 4, then the user will be able to pass a maximum of 4 elements. In the description of an array parameter, the maximum number of parameters is normally placed in square brackets.

Example:

int num_paths = 2;
char *paths[4] = {"/bin", "/sbin", NULL , NULL};
module_param_array(paths, charp, &num_paths, 0);
MODULE_PARM_DESC(paths, "Search paths [4]");

Use:

int i;
for (i=0; i<num_paths; ++i)
    printk(KERN_INFO "Path[%d]: %s\n", i, paths[i]);

The third example module shows the use of parameters.

Error handling in the kernel

When writing in kernel mode, keep in mind that stability of the entire system depends on the correct operation of our code – it is absolutely necessary to handle all possible errors in our module in a way that does not interfere with the rest of the kernel and maintain strict control over the life time of the allocated resources (memory leaks in the kernel will result in the necessity of periodically restarting the whole system).

Most non-trivial functions in the kernel can fail and return the error code as a result. To describe the error encountered, numeric error codes are used, the same as errno in the user code, but negated (i.e., a function that detects a permission error executes return -EPERM;). The range of numbers for such error codes is -4096 .. -1.

There are 4 conventions for returning error codes from functions in the kernel (and before using a function you should always check which conventions it uses):

  • the function does not return a result other than the error code – the returned type is int, the return value is the error code, or 0 in case of success.
  • function returns a numeric type (int, long, off_t, …) – values -4096 .. -1 mean error code, remaining values mean “normal” result.
  • function returns a pointer – in the event of an error, the negative error code is cast to the pointer type and returned. When called, you need to check whether the returned pointer is a cast error code.
  • the function returns a pointer and does not use error codes – in the event of an error, NULL is returned, and the user must infer the appropriate error code (an example of such a function is kmalloc – if it returns NULL , your code should treat that as -ENOMEM).

The following macros are useful for handling error codes (linux/err.h):

IS_ERR_VALUE(x)
true if x (integer) is an error code (i.e., -4096 .. -1)
void *ERR_PTR(long error)
converts the error code from number to a pointer
long PTR_ERR(const void *ptr)
converts in the opposite direction
long IS_ERR(const void *ptr)
true if the pointer is an error code
long IS_ERR_OR_NULL(const void *ptr)
true if the pointer is an error code or NULL
void *ERR_CAST(const void *ptr)
converts the error code from a pointer to a pointer (useful in the case of different pointer types)

If the function used by us returns an error, always remember (unless we have a special error handling planned) to clean up all resources allocated in the current function and to forward the unmodified error code upstream. A quite common (and recommended) idiom in the kernel is to use goto to reach a common error path:

resource_c *give_c() {
    int res;
    mutex_lock(&lock);
    resource_a *a = give_a();
    if (IS_ERR(a)) {
        res = PTR_ERR(a);
        goto err_a;
    }
    int b = give_b();
    if (IS_ERR_VALUE(b)) {
        res = b;
        goto err_b;
    }
    resource_c *c = kmalloc(sizeof *c, GFP_KERNEL);
    if (c == NULL) {
        res = -ENOMEM;
        goto err_c;
    }
    c->a = a;
    c->b; = b;
    mutex_unlock(&lock);
    return c;

    /* Common error handling */
err_c:
    release_b(b);
err_b:
    release_a(a);
err_a:
    mutex_unlock(&lock);
    return ERR_PTR(res);
}

A list of error codes can be found in asm-generic/errno-base.h and asm-generic/errno.h. It should be remembered that many of these errors have strictly defined semantics, sometimes loosely related to the description, and should only be used in specific situations. The most important codes that should be mentioned:

-EFAULT
Error when copying from / to user memory (any other usage is incorrect).
-ENOMEM
Depletion of operational memory (but not other types of resources).
-ENOSPC
Exhaustion of disk space or other sufficiently similar device.
-ENOENT
The specified file (or other sufficiently similar resource) was not found.
-ESRCH
The specified process was not found.
-EPERM
There are no (loosely defined) permissions to perform the operation.
-EACCES
The operation is forbidden by permissions on the file system.
-EEXISTS
The operation failed because the file (or other resource) already exists (used, for example, for operations that create files).
-EIO
The device broke down in an undefined manner, not due to the caller’s fault (scratched CD, etc.).
-EINVAL
The user provided incorrect parameters (contradictory, not supported by the device, etc.).
-ENOTTY
An attempt to perform an operation on an incompatible device type (eg attempt to change the terminal settings on a regular file). Used primarily for rejecting unknown ioctl.
-ERESTARTSYS
Used to interrupt waits when it is necessary to exit the kernel to deliver a signal to the user process – in the right place it will be converted to -EINTR or trigger restarting the system call.
-EINTR
The system call was interrupted by a signal – should not be used directly (instead, return -ERESTARTSYS).
-ESPIPE
An attempt to change the position of a file on an object in which such a concept does not make sense (pipe, socket, terminal …).

Returning -1 instead of an error code, using an obviously incorrect error code, or unnecessarily throwing out the error code returned by a called function will be worth negative points in the assignments.

Dynamic memory allocation for the kernel

There are many functions in the kernel that allow for dynamic memory allocation. The most important and most commonly used is kmalloc (linux/mm.h):

void *kmalloc(size_t size, gfp_t flags);
void kfree(void *obj);

The kmalloc function allows for the allocation of a contiguous physical memory area of up to 32 pages of memory (this gives slightly less than 128kb of memory an x86, a portion of memory is reserved by the kernel for a block header). Memory allocation is carried out quickly (Buddy algorithm). The flags parameter specifies the type of memory (constants GFP_* defined in the file linux/gfp.h) – the most important are:

  • GFP_KERNEL – the most-used one, may block, so you can only call it from a process context
  • GFP_ATOMIC – does not block, can be called from interrupt service routines (although usually this is a bad idea).
void *vmalloc(size_t size);
void vfree(void *addr);

With vmalloc you can allocate an area of any size (provided that there is enough free physical memory), but it is not physically contiguous (this area goes through address translation).

struct page *alloc_pages(gfp_t flags, unsigned long order)
void __free_pages(struct page *page, unsigned long order)

Allocates 2 ** order entire pages, the flags parameter specifies how to allocate pages (as in kmalloc).

The fourth example shows the use of the kmalloc function.

A private heap

When we have a lot of objects of identical lengths, it may be useful to create our own heap designed specifically for the given type of objects. The following functions are used for this purpose:

kmem_cache_t * kmem_cache_create(
    char *name, size_t size, size_t align,
    unsigned long flags,
    void (*ctor)(void*));

int kmem_cache_destroy (kmem_cache_t * cachep);

The parameter flags is usually set to 0 (most flags are for debugging purposes only).

kmem_cache_t is our private heap – it consists of dynamically allocated pages cut into fragments of exactly the specified length with minimal overhead, additionally arranged to maximize the CPU cache. We can allocate our object on it with the following functions:

void *kmem_cache_alloc(kmem_cache_t *cachep, int flags);
void kmem_cache_free(kmem_cache_t *cachep, void* objp);

The memory for the new object is initialized using the constructor specified when creating the cache. For the convenience of simple cases (if the constructor is not needed), the macro KMEM_CACHE wrapping kmem_cache_create is defined.

Automatic loading of required modules – kmod

Kmod is a kernel subsystem that loads modules “on demand”, i.e. when there is a call to a service related to the given module.

When a user requests access to a device that is supported by a module that is not loaded, the kernel suspends execution of the program and executes the function request_module() requesting the loading of the appropriate module. This function is provided by kmod and works by executing a program (/sbin/modprobe by default, but this can be changed with /proc) for the requested module.

If module loading on demand is to be used in the module, then include:

#include <linux/kmod.h>

On-demand loading is possible thanks to the function:

int request_module (const char *module_name)

Reference count

Each module has its own reference count – as long as it is positive, the kernel will not allow the module to be removed. It should be increased when our module is in active use (eg it handles an open device or a mounted file system). The management of such a counter is usually done by other kernel subsystems, but you have to help them by passing the pointer to your module (macro THIS_MODULE). For example, for a character device driver, you must fill the owner field of the file_operations structure with this pointer.

Libraries

Inside the kernel, it is not possible to use any libraries known from the user space, even the standard C library. However, the kernel has its own library of basic functions, containing many functions known from the standard C library or very similar to them, including:

  • most of the functions known from string.h (memcpy, strcmp, strcpy, …)
  • kstrto[u](int|l|ll): functions converting from strings to numbers, similar to the standard strto*, but with a different interface
  • malloc, free, calloc: do not exist, replaced by kmalloc, vmalloc, and several other memory allocators depending on your needs
  • snprintf, sscanf: they work similar to ordinary ones, but they have a different set of formats (eg %pI4 prints an IPv4 address)
  • bsearch: as in the C standard
  • sort: like the standard qsort, but you also need to pass a function that swaps two elements

These functions are contained in other headers than usual: linux/string.h, linux/bsearch.h, etc.

Exercise

  • Compile and run the sample modules.
  • Experimentally investigate the maximum size that can be allocated with kmalloc.
  • Make the 4th example work for larger buffers (using vmalloc).
  • Find and explain the security hole in one of the sample codes. Consider the consequences of this type of errors in the kernel code.

Literature

  1. man insmod, rmmod, lsmod, modprobe, depmod, modinfo
  2. A. Rubini, J. Corbet “Linux Device Drivers” 2nd Edition, O’Reilly 2001, chapters II and XI - http://www.xml.com/ldd/chapter/book
  3. Peter Salzman, Ori Pomerantz “The Linux Kernel Module Programming Guide”, 2001 - http://www.faqs.org/docs/kernel
  4. http://tldp.org/HOWTO/Module-HOWTO/
  5. http://tldp.org/LDP/lkmpg/2.6/html/index.html
  6. Documentation/kbuild/makefiles.txt, modules.txt