======================= Class 5: Kernel modules ======================= Date: 28.03.2019 .. contents:: .. toctree:: :hidden: Additional materials ==================== - :download:`examples.tar` -- example modules What is a module? ================= A module is relocatable code / data that can be inserted and removed from the kernel while the system is running. The module may refer to (exported) kernel symbols as if it was compiled as part of the kernel and may itself provide (export) symbols that other modules may use. The module is responsible for a certain service in the kernel -- for example, modules can be device and file system drivers, network filters, cryptographic algorithms, etc. The modules are compiled for a specific version and configuration of the kernel -- the use of modules from other versions of the kernel (or the same version with significantly different configuration) will probably fail. The module loading system will try to detect and prevent such a situation. Programs and files related to module management in Linux ======================================================== ``insmod module_name.ko [parameters]`` Loads the given module file to the kernel. If parameters are given, they are passed to the module. Enter the full path to the module -- ``insmod`` does not try to search for the file you need. Parameters have the form variable = value eg. :: insmod ne.ko io = 0x300 irq = 7 ``modprobe module_name [parameters]`` A user-friendly interface to ``insmod`` -- loads the module, searching for it in ``/lib/modules`` and loading all required dependencies. For this purpose, a database of dependencies between modules created using ``depmod`` is used (see below). Modules are searched by default in the ``/lib/modules/`` directory. ``depmod -a`` Creates a dependency database between modules for the current kernel. The dependencies will be written to the file ``/lib/modules//modules.dep``. ``/etc/modprobe.conf`` and / or ``/etc/modprobe.d/*`` Files that control the behavior of ``modprobe`` and ``depmod``. Traditionally there was one configuration file. Currently, due to the ease of modification, the ``/etc/modprobe.d/`` directory is used, in which files containing options are placed. In this way, you can easily add options, e.g. when installing a device without having to modify existing files. The most important commands: ``alias name module_name`` Defines that the module ``module_name`` is to be loaded when the name ``module`` is requested to be loaded, e.g. :: alias eth0 ne2k-pci loads the corresponding network adapter module when the ``eth0`` module is requested to load. ``options module_name options`` Sets the options given for each request to load a given module, eg:: options ne io=0x300 irq=10 will result in the use of the ``io=0x300 irq=10`` options whenever the ``ne`` module is to be loaded. ``install module_name command...`` Executes the given shell command instead of loading the given module. It is also possible to load a module or several modules by this command, e.g. :: install foo /sbin/modprobe bar; /sbin/modprobe --ignore-install foo $CMDLINE_OPTS The ``--ignore-install`` option is necessary to prevent loops when loading the foo module, ignoring the install option. The parameter ``$ CMDLINE_OPTS`` will be replaced with the options given in the modprobe call or specified with the options commands. The install command is also useful for other tricks, such as loading the firmware after loading a module. It is also possible to load the first matching module using this construction:: install probe-ethernet / sbin / modprobe e100 || / sbin / modprobe eepro100 The first module that successfully loads results in discontinuation of further checking. In this case, it is the first matching module for the network card. ``blacklist module_name`` Causes the module to not be automatically loaded (eg by udev), it is useful in the case of unused drivers or debug modules which are not used (eg ``evbug``). ``rmmod module_name`` Removes the given module from the kernel (``mod_name`` is the name of the module, not the name of the ``.ko`` file). The kernel automatically tracks which modules are currently actively used (eg they are dependencies of other modules, control a mounted filesystem, support a device opened by some process) and refuses to remove them. If you want to delete a module that is un use, you can use the ``-f`` option, but it usually ends very badly. ``lsmod`` Lists all loaded modules with information about their dependencies (the same data can be seen by ``cat /proc/modules``). ``modinfo module_or_file_name`` Prints the description of the module along with a list of parameters. Creating modules ================ The kernel modules (as well as the main kernel code) are written in C (using other languages is not possible). The environment inside the kernel, however, is quite specific and differs greatly from writing an ordinary program in the user space. Kernel code is written according to the official coding style -- https://www.kernel.org/doc/html/v4.15/process/coding-style.html . Compilation of modules ---------------------- To compile modules, you need a directory with the configured and compiled kernel source. In principle, only the header files and configuration are enough, but separating the appropriate files from the rest is a very complicated process and only Linux distributions with a large number of their own scripts are able to do so. The Kbuild system is responsible for compiling the modules (as well as the kernel itself), which is quite a complicated overlay on top of Makefile. To compile our module, we need to create a ``Kbuild`` file describing our code, for example:: obj-m := module.o different_module.o compiles the ``module.c`` file to the ``module.ko`` module, and the ``different_module.c`` file to the ``different_module.ko`` file. If we want to combine several source files into one module, we can do it as follows:: obj-m := module.o module-objs := module_p1.o module_p2.o This Kbuild file will compile the ``module_p1.c`` and ``module_p2.c`` files and combine them into the ``module.ko`` module. To call the compilation of the module, you should call ``make`` in the kernel source directory, pointing it to our directory with external modules:: make -C /usr/src/linux- M=/home//my_modules For simplicity, you can write your own ``Makefile`` that calls the appropriate command (see example). Module metadata --------------- Each module can (but does not have to) define metadata using the following macros (defined in ``linux/module.h``):: MODULE_LICENSE("GPL"); MODULE_AUTHOR("Horse Fred"); MODULE_DESCRIPTION("Driver for my device"); Metadata defined in this way are stored (along with many other data) in the ``.modinfo`` section of the finished module and can be printed with the ``modinfo`` command. Choosing a license has an important and unobvious effect -- using a GPL-compatible license will allow us to use kernel symbols marked as available only for modules licensed under the GPL. The following are recognized as compatible licenses: - ``"GPL"`` -- GNU Public License v2 or later, - ``"GPL v2"`` -- GNU Public License v2, - ``"GPL and additional rights"`` -- GNU Public License v2 + additional rights - ``"Dual BSD/GPL"`` -- GNU Public License v2 or BSD license to choose from - ``"Dual MPL/GPL"`` -- GNU Public License v2 or Mozilla to choose from - ``"Dual MIT/GPL"`` -- GNU Public License v2 or MIT to choose from Module constructor and destructor --------------------------------- Modules do not have a ``main`` function or their own process / thread (unless they create it themselves, but it is quite rare). Instead, the module's code is called by various kernel subsystems when there is something to do for it. Each module can define a function initiating the module (constructor) and releasing the module (destructor). These functions are defined as follows:: int my_init_function(void) { /* ... */ } void my_cleanup_function(void) { /* ... */ } module_init(my_init_function); module_exit(my_cleanup_function); The init function is called when the module is loaded. If everything went well, it should return 0. If it failed to initialize the module, it should return the error code (negated code from ``errno*.h``) -- the module will be immediately removed by the kernel. The cleanup function is called when the module is removed (but is not called when the init function has returned an error). The task of the init function is to register the functionality provided by the module into the kernel structures -- for example, a PCI device driver will in this function inform the PCI subsystem of supported devices and functions that should be triggered when a matching device is detected. Without such registration, the kernel will never call the code of our module, so modules without an initializing function are basically only useful as code libraries for other modules. The task of the cleanup function is to reverse everything that the init function has done and clean after all the module's activity. If the module has an initiating function, always provide a cleanup function (even if it has to be empty) -- otherwise, the kernel will reason that our module does not support removal and will not allow ``rmmod`` to be executed on it. Sometimes you can fine older modules that use functions with the default names of ``init_module()`` and ``cleanup_module()``, without declaring them with ``module_init()`` and ``module_exit()``. This is not recommended in current kernel versions. A module should have only one constructor and only one destructor. printk ------ For the purposes of debugging and informing about important events, you can use the ``printk`` function, similar to ``printf`` :: printk(KERN_WARNING "Failed, error code: %d\n", err); Before the message, attach its priority (it should be noted that there is no comma), which may be (in ascending order): - ``KERN_DEBUG`` - ``KERN_INFO`` - ``KERN_NOTICE`` - ``KERN_WARNING`` - ``KERN_ERR`` - ``KERN_CRIT`` - ``KERN_ALERT`` - ``KERN_EMERG`` The messages printed by ``printk`` will be available in the system log, which can be viewed using the ``dmesg`` command. If they have a high enough priority, they will also be immediately written to the console. The first example module shows the use of ``printk`` as well as the constructor and the destructor. Using external symbols ---------------------- In modules, you can freely use symbols defined and exported by the main kernel code and by other modules (you can view them in the file ``/proc/kallsyms``). In order for a symbol of our module to be visible from the outside, it should be exported with the macro ``EXPORT_SYMBOL``:: EXPORT_SYMBOL(my_function); int my_function(int x) { ... } There is also a similar macro ``EXPORT_SYMBOL_GPL``, exporting a symbol only for modules under the GPL (or compatible) license. The ``depmod`` program automatically collects information on dependencies between modules resulting from the use of exported symbols and ensures that they are loaded in the correct order. The second example module shows the export of symbols and the use of exported symbols. Parameterization of modules --------------------------- You can declare that a specified variable will contain a parameter that can be changed when the module is loaded. The name of the parameter is the same as the variable name. When the module is being loaded, the values given by the user (if any) will be inserted in place of the given variables, eg:: insmod module.ko irq=5 stores the value of 5 into the ``irq`` variable. To declare that a variable is to be used as a parameter of the module use the following macro:: module_param(variable, type, permissions); Types can be: ``byte``, ``short``, ``ushort``, ``int``, ``uint``, ``long``, ``ulong``, ``charp`` , ``bool``, ``invbool``. The ``charp`` type is used to pass strings (``char *``). The ``invbool`` type means a ``bool`` parameter, which is a negation of the value given by the user. You can define your own parameter types -- to do that, you must define the functions ``param_get_XXX``, ``param_set_XXX`` and ``param_check_XXX``. ``permissions`` means the file permissions that will be given to the parameter in ``sysfs``. Each parameter should have a description. The description of the parameter can be read along with the description of the entire module using the ``modinfo`` program, thanks to which the module carries a description of its use. The description is given by the macro ``MODULE_PARM_DESC``:: MODULE_PARM_DESC(variable, description); Examples:: int irq = 7; module_param(irq, int, 0); MODULE_PARM_DESC(irq, "Irq used for device"); char *path="/sbin/modprobe"; module_param(path, charp, 0); MODULE_PARM_DESC(path, "Path to modprobe"); Use:: printk(KERN_INFO "Using irq: %d", irq); printk(KERN_INFO "Will use path: %s", path); To declare an array of parameters you must use a different macro:: module_param_array(variable, type, pointer_to_count, permissions); All fields except ``pointer_to_count`` have the same meaning as in ``module_param()``. ``pointer_to_count`` contains a pointer to the variable to which the number of elements in the array will be written. If you are not interested in the number of arguments, you can specify ``NULL``, but then you need to recognize whether the argument is present based on its contents, which is not recommended. The maximum number of array elements is determined by the array declaration, e.g. if we declare its size to 4, then the user will be able to pass a maximum of 4 elements. In the description of an array parameter, the maximum number of parameters is normally placed in square brackets. Example:: int num_paths = 2; char *paths[4] = {"/bin", "/sbin", NULL , NULL}; module_param_array(paths, charp, &num_paths, 0); MODULE_PARM_DESC(paths, "Search paths [4]"); Use:: int i; for (i=0; ia = a; c->b; = b; mutex_unlock(&lock); return c; /* Common error handling */ err_c: release_b(b); err_b: release_a(a); err_a: mutex_unlock(&lock); return ERR_PTR(res); } A list of error codes can be found in ``asm-generic/errno-base.h`` and ``asm-generic/errno.h``. It should be remembered that many of these errors have strictly defined semantics, sometimes loosely related to the description, and should only be used in specific situations. The most important codes that should be mentioned: ``-EFAULT`` Error when copying from / to user memory (any other usage is incorrect). ``-ENOMEM`` Depletion of operational memory (but not other types of resources). ``-ENOSPC`` Exhaustion of disk space or other sufficiently similar device. ``-ENOENT`` The specified file (or other sufficiently similar resource) was not found. ``-ESRCH`` The specified process was not found. ``-EPERM`` There are no (loosely defined) permissions to perform the operation. ``-EACCES`` The operation is forbidden by permissions on the file system. ``-EEXISTS`` The operation failed because the file (or other resource) already exists (used, for example, for operations that create files). ``-EIO`` The device broke down in an undefined manner, not due to the caller's fault (scratched CD, etc.). ``-EINVAL`` The user provided incorrect parameters (contradictory, not supported by the device, etc.). ``-ENOTTY`` An attempt to perform an operation on an incompatible device type (eg attempt to change the terminal settings on a regular file). Used primarily for rejecting unknown ``ioctl``. ``-ERESTARTSYS`` Used to interrupt waits when it is necessary to exit the kernel to deliver a signal to the user process -- in the right place it will be converted to ``-EINTR`` or trigger restarting the system call. ``-EINTR`` The system call was interrupted by a signal -- *should not be used directly* (instead, return ``-ERESTARTSYS``). ``-ESPIPE`` An attempt to change the position of a file on an object in which such a concept does not make sense (pipe, socket, terminal ...). Returning ``-1`` instead of an error code, using an obviously incorrect error code, or unnecessarily throwing out the error code returned by a called function will be worth negative points in the assignments. Dynamic memory allocation for the kernel ---------------------------------------- There are many functions in the kernel that allow for dynamic memory allocation. The most important and most commonly used is ``kmalloc`` (``linux/mm.h``):: void *kmalloc(size_t size, gfp_t flags); void kfree(void *obj); The ``kmalloc`` function allows for the allocation of a contiguous physical memory area of up to 32 pages of memory (this gives slightly less than 128kb of memory an x86, a portion of memory is reserved by the kernel for a block header). Memory allocation is carried out quickly (Buddy algorithm). The ``flags`` parameter specifies the type of memory (constants ``GFP_*`` defined in the file ``linux/gfp.h``) -- the most important are: - ``GFP_KERNEL`` -- the most-used one, may block, so you can only call it from a process context - ``GFP_ATOMIC`` -- does not block, can be called from interrupt service routines (although usually this is a bad idea). :: void *vmalloc(size_t size); void vfree(void *addr); With ``vmalloc`` you can allocate an area of any size (provided that there is enough free physical memory), but it is not physically contiguous (this area goes through address translation). :: struct page *alloc_pages(gfp_t flags, unsigned long order) void __free_pages(struct page *page, unsigned long order) Allocates ``2 ** order`` entire pages, the ``flags`` parameter specifies how to allocate pages (as in ``kmalloc``). The fourth example shows the use of the ``kmalloc`` function. A private heap ~~~~~~~~~~~~~~ When we have a lot of objects of identical lengths, it may be useful to create our own heap designed specifically for the given type of objects. The following functions are used for this purpose:: kmem_cache_t * kmem_cache_create( char *name, size_t size, size_t align, unsigned long flags, void (*ctor)(void*)); int kmem_cache_destroy (kmem_cache_t * cachep); The parameter ``flags`` is usually set to 0 (most flags are for debugging purposes only). ``kmem_cache_t`` is our private heap -- it consists of dynamically allocated pages cut into fragments of exactly the specified length with minimal overhead, additionally arranged to maximize the CPU cache. We can allocate our object on it with the following functions:: void *kmem_cache_alloc(kmem_cache_t *cachep, int flags); void kmem_cache_free(kmem_cache_t *cachep, void* objp); The memory for the new object is initialized using the constructor specified when creating the cache. For the convenience of simple cases (if the constructor is not needed), the macro ``KMEM_CACHE`` wrapping ``kmem_cache_create`` is defined. Automatic loading of required modules -- kmod --------------------------------------------- Kmod is a kernel subsystem that loads modules "on demand", i.e. when there is a call to a service related to the given module. When a user requests access to a device that is supported by a module that is not loaded, the kernel suspends execution of the program and executes the function ``request_module()`` requesting the loading of the appropriate module. This function is provided by kmod and works by executing a program (``/sbin/modprobe`` by default, but this can be changed with ``/proc``) for the requested module. If module loading on demand is to be used in the module, then include:: #include On-demand loading is possible thanks to the function:: int request_module (const char *module_name) Reference count --------------- Each module has its own reference count -- as long as it is positive, the kernel will not allow the module to be removed. It should be increased when our module is in active use (eg it handles an open device or a mounted file system). The management of such a counter is usually done by other kernel subsystems, but you have to help them by passing the pointer to your module (macro ``THIS_MODULE``). For example, for a character device driver, you must fill the ``owner`` field of the ``file_operations`` structure with this pointer. Libraries --------- Inside the kernel, it is not possible to use any libraries known from the user space, even the standard C library. However, the kernel has its own library of basic functions, containing many functions known from the standard C library or very similar to them, including: - most of the functions known from ``string.h`` (``memcpy``, ``strcmp``, ``strcpy``, ...) - ``kstrto[u](int|l|ll)``: functions converting from strings to numbers, similar to the standard ``strto*``, but with a different interface - ``malloc``, ``free``, ``calloc``: do not exist, replaced by ``kmalloc``, ``vmalloc``, and several other memory allocators depending on your needs - ``snprintf``, ``sscanf``: they work similar to ordinary ones, but they have a different set of formats (eg ``%pI4`` prints an IPv4 address) - ``bsearch``: as in the C standard - ``sort``: like the standard ``qsort``, but you also need to pass a function that swaps two elements These functions are contained in other headers than usual: ``linux/string.h``, ``linux/bsearch.h``, etc. Exercise ======== - Compile and run the sample modules. - Experimentally investigate the maximum size that can be allocated with ``kmalloc``. - Make the 4th example work for larger buffers (using ``vmalloc``). - Find and explain the security hole in one of the sample codes. Consider the consequences of this type of errors in the kernel code. Literature ========== 1. ``man insmod``, ``rmmod``, ``lsmod``, ``modprobe``, ``depmod``, ``modinfo`` 2. A. Rubini, J. Corbet "Linux Device Drivers" 2nd Edition, O'Reilly 2001, chapters II and XI - http://www.xml.com/ldd/chapter/book 3. Peter Salzman, Ori Pomerantz "The Linux Kernel Module Programming Guide", 2001 - http://www.faqs.org/docs/kernel 4. http://tldp.org/HOWTO/Module-HOWTO/ 5. http://tldp.org/LDP/lkmpg/2.6/html/index.html 6. ``Documentation/kbuild/makefiles.txt``, ``modules.txt``