.. _lab-chardev-en: ============================= Class 9: Character Devices ============================= Date: 29.04.2025 Additional Materials ==================== - :download:`example.tar` - :ref:`small_task_8` .. highlight:: c About Devices and Drivers in Linux ======================================== A device driver is a collection of code that supports a device (usually hardware, but virtual devices exist) and exports a set of functions allowing the device to be used by the user. The driver almost always has exclusive direct access to the device. It is often responsible for reconciling the needs of different users (although this is often handled by higher layers -- the file system can be considered such a layer for a hard drive). Device drivers are usually kernel modules. If the device is not too "close" to the processor, it is also possible to write a driver in user space (USB devices are great for this), but we will not cover this in class. To make a driver's functionality available to user programs, you must use one of many possible communication mechanisms with the kernel. The most common are: - block device file -- used for hard drives and sufficiently similar creations (floppy disk, CD-ROM, SSD, ...). The driver exposes read and write functions for blocks, and the block layer handles requests from the user (buffering, queuing, etc.). - character device file -- used for most types of devices. The driver simply provides functions corresponding to system calls that operate on files -- the kernel passes these calls directly to the driver, allowing the implementation of any interface. - network interface -- the driver exposes functions for sending and receiving packets, to which the network subsystem connects. The user can use it through socket calls. - file in ``proc`` -- used in the case of drivers with a trivial interface (e.g., the entire driver functionality is reading/writing a single parameter). - file in ``sysfs`` -- as above, but newer (and simpler) interface. File Representation --------------------- Using character and block devices involves creating a corresponding special file somewhere in the file system (almost always ``/dev``) and opening it. This special file is only a "gateway to the kernel" and the only information about it stored in the file system is: - a flag indicating the device type (``b`` -- block, ``c`` -- character) - a major device number - a minor device number Typically, the major number chooses the driver, and the minor number chooses a specific instance of the device supported by that driver. However, there are cases where a single major number is shared by many drivers if they only export one device. At the C language level, both numbers are packed into a single number of type ``dev_t``. The following macros are useful (``linux/kdev_t.h``): ``int MAJOR(dev_t number)`` returns the major device number. ``int MINOR(dev_t number)`` returns the minor device number. ``dev_t MKDEV(int major, int minor)`` Packs numbers into ``dev_t`` type. To create a device file manually, you can use the following command:: mknod /dev/filename type major minor When typing ``ls -l`` we will see the device numbers where the file size is usually located. The kernel exports information about available device drivers live in the ``sysfs`` file system, and the ``udevd`` program (or ``systemd-udevd``) monitors this information continuously and creates appropriate files in ``/dev``. Since there is no need to pre-establish the used numbers, they are allocated dynamically. To make this mechanism work, the driver must register its device in the ``sysfs`` hierarchy. Character Device Drivers in Linux ================================= Registering Device Drivers -------------------------- Registering a driver involves several steps: 1. Allocate a range of device numbers:: int alloc_chrdev_region(dev_t *dev, unsigned baseminor, unsigned count, const char *name); void unregister_chrdev_region(dev_t first, unsigned int count); This allocation is usually called only once, in the initialization function, allocating the entire range of numbers as a precaution. 2. Prepare the ``file_operations`` structure describing the operations on our device. Such structures are usually global (there is no point in allocating them dynamically). 3. Create and fill the ``cdev`` structure:: void cdev_init(struct cdev *cdev, const struct file_operations *fops); You can also request a dynamic allocation of the structure:: struct cdev *cdev_alloc(void); In this case, you must manually fill the ``ops`` field with a pointer to our structure (do not mix this call with ``cdev_init``). 4. Register our ``cdev`` structure:: int cdev_add(struct cdev *p, dev_t dev, unsigned count); void cdev_del(struct cdev *p); At this point, our device becomes available to user space when someone opens the appropriate special file. We can attach devices individually or as a range (``count`` parameter). If the ``cdev`` structure was created by ``cdev_alloc``, the structure will be automatically released by ``cdev_del``. If, on the other hand, it was initialized by ``cdev_init``, releasing it is the driver's responsibility. Note that ``cdev_del`` only detaches the device from the device array, but does not guarantee that no one is using it anymore - previously opened file descriptors will still work (although if we are in ``module_exit``, we are guaranteed that there are no such descriptors). In the case of implementing e.g. a device that should support hot-unplug, you must ensure this yourself e.g. by counting references. 5. Register the device class in ``sysfs`` (or use an existing one if it fits):: struct class my_class = { .name = "abc", }; int class_register(struct class *class); void class_unregister(struct class *class); This is only done once for all our devices (or for the device type if we have many). 6. Register our device in ``sysfs``:: struct device *device_create(struct class *cls, struct device *parent, dev_t devt, void *drvdata, const char *fmt, ...); void device_destroy(struct class *cls, dev_t devt); ``parent`` points to the device to which our device is connected -- the directory in sysfs corresponding to our device will be a subdirectory of the directory of the specified device. For character device drivers corresponding to e.g. PCI devices, the parent will be set to the ``dev`` field of the ``pci_device`` structure. You can set this parameter to ``NULL`` to receive a top-level device. ``drvdata`` can be used to store additional private information for our driver (useful if e.g. we want to create files in sysfs to control our device). ``fmt`` and subsequent parameters are passed to ``sprintf`` to create the device name that will appear in ``/dev``. At this point, ``udevd`` will receive a notification about the new device and (in finite time) create the appropriate file in ``/dev``. The ``file_operations`` structure --------------------------------- The ``file_operations`` structure (defined in ``linux/fs.h``) describes how to perform operations on a given file. Every file (and in general everything that can be an open file descriptor) in Linux has such a structure -- for ordinary files, it is provided by the file system driver. For character devices, it is provided by the device driver. It has many fields (corresponding to operations), the most important of which are:: struct file_operations { struct module *owner; loff_t (*llseek) (struct file *, loff_t, int); ssize_t (*read) (struct file *, char __user *, size_t, loff_t *); ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *); int (*unlocked_ioctl) (struct file *, unsigned int, unsigned long); int (*compat_ioctl) (struct file *, unsigned int, unsigned long); int (*mmap) (struct file *, struct vm_area_struct *); int (*open) (struct inode *, struct file *); int (*release) (struct inode *, struct file *); /* ... */ }; We must fill the ``owner`` field with a pointer ``THIS_MODULE`` -- this allows the kernel to automatically manage the module's reference counter. Fields of the ``file`` structure --------------------------------- The ``file`` structure (defined in ``linux/fs.h``) represents an open file within the kernel. It is created by the kernel when ``open`` is called and passed to all operations on the file until the last ``close`` call (i.e., when ``release`` is called). It is worth noting that an open file (the ``file`` structure) is a different thing than a file on disk (represented by the ``inode`` structure). The most important fields of this structure are:: struct file { mode_t f_mode; loff_t f_pos; unsigned int f_flags; struct file_operations *f_op; void *private_data; /* ... */ }; The ``f_mode`` field allows you to determine whether the file is open for reading (``FMODE_READ``), writing (``FMODE_WRITE``), or both. You do not need to check this field in the ``read`` and ``write`` functions, because the kernel performs this test before calling the driver's function. The ``f_pos`` field specifies the position for writing or reading (used by ``read``, ``write``, ``lseek``, etc.). The ``f_flags`` flags are mainly used to check whether the operation should be blocking or not (``O_NONBLOCK``), although it contains many more flags. The ``f_op`` field specifies a set of functions that implement file operations. This field is set (to operations from the ``cdev`` structure) by the kernel when ``open`` is called, and then it is used for all subsequent operations (the driver can replace the value of this field in ``open`` to choose an alternative set of functions). The ``private_data`` pointer is set to ``NULL`` when the file is opened. The driver can use this pointer for its own purposes (in which case it is responsible for freeing the memory allocated for this field). The ``open`` operation -- opening a file ----------------------------------------- The prototype of this function looks as follows:: int open(struct inode *inode, struct file *filp) The ``open`` operation allows the driver to perform preparatory operations before other operations. Usually, the following steps are performed: - check for errors related to the device (e.g., check if the device is ready); - initialize the device if it is being opened for the first time and we are using lazy initialization; - identify the minor number (``MINOR(inode->i_rdev)``) and, if necessary, replace the set of operations pointed to by ``f_op``; - allocate memory for data related to the device, initialize the data structures, and assign the ``private_data`` pointer; The ``release`` operation -- closing a file -------------------------------------------- The kernel keeps a reference counter for each existing ``file`` structure. It can be increased, for example, by calling ``dup`` or inheriting an open file by ``fork``. When this counter finally falls to 0 (``close`` is called on the last descriptor, or the process holding that descriptor calls ``exit``), the ``release`` function is called, serving as the file's destructor:: int release(struct inode *inode, struct file *filp) Its task is to free the resources allocated in the ``open`` operation and perform similar cleanup operations: - free the ``private_data`` memory; - turn off the device when it is the last ``release`` call; ``read`` and ``write`` operations -- data transfer -------------------------------------------------- The prototype of this function looks as follows:: ssize_t read(struct file *filp, char __user *buff, size_t count, loff_t *offp) ssize_t write(struct file *filp, const char __user *buff, size_t count, loff_t *offp) The task of the ``read`` operation is to copy a portion of data from the kernel address space to a specified address (``buff``) in the user address space. The ``write`` operation works in the opposite direction. These functions are used to implement many system calls (``read``, ``pread``, ``readv``, ...). ``offp`` is a pointer to the current position in the file. If such a position makes sense for our file, we take it from there and write the updated position value there. For a regular ``read`` and ``write``, it will be a pointer to ``filp->f_pos``, and for ``pread`` and ``pwrite`` it will be a pointer to a variable on the stack containing the system call parameter. The value returned by this function will be interpreted as follows: - a value greater than zero indicates the number of bytes copied; if it is equal to the value of the argument passed to the system call, it indicates complete success; if it is smaller, it means that only part of the data was transferred - then it should be expected that the program will repeat the system call (e.g., this is the standard behavior of the library functions ``fread``/``fwrite``) - if the value is equal to ``0``, the end of the file has been reached (used only in ``read``) - a negative value indicates an error When implementing these operations, remember to maintain the correct semantics - returning an error means that no bytes were read/written. If our driver detects an error only after a certain amount of transferred bytes (and there is no easy way to undo the transfer), return the number of transferred bytes instead of an error - the error code will be returned when the user repeats the operation for the remaining bytes. ``llseek`` operation -- changing the file position --------------------------------------------------- The prototype of this function looks as follows:: loff_t llseek(struct file *filp, loff_t off, int whence) The ``llseek`` operation implements the system calls ``lseek`` and ``llseek``. The default kernel behavior when the ``llseek`` operation is not specified in the driver's operations is to change the ``f_pos`` field of the file structure. If the concept of changing the file position does not make sense for the device, write a function that returns an error here. A ready-made function ``no_llseek`` is available in the kernel for this purpose, which always returns ``-ESPIPE``. Operation ``ioctl`` – invoking device-specific commands ------------------------------------------------------------------- Function prototypes look as follows:: long (*unlocked_ioctl) (struct file *filp, unsigned int cmd, unsigned long arg); long (*compat_ioctl) (struct file *filp, unsigned int cmd, unsigned long arg); The ``unlocked_ioctl`` function handles ``ioctl`` calls from the "main" kernel architecture. The name is a historical artifact from the big kernel lock era -– drivers requiring the big kernel lock used to fill the ``ioctl`` field, while newer or converted drivers using their own locks used ``unlocked_ioctl``. In current kernel versions, the big kernel lock and the ``ioctl`` field no longer exist. The ``compat_ioctl`` function handles ``ioctl`` calls from user programs in compatibility mode with the 32-bit architecture version – e.g., programs for the i386 architecture under a kernel for the x86_64 architecture. If the structures passed through ``ioctl`` do not contain fields of architecture-dependent size, both fields can be set to the same function. The first argument corresponds to the file descriptor passed by the system call. The ``cmd`` argument is exactly the same as in the system call. The optional ``arg`` argument is passed as an ``unsigned long`` number regardless of the type used in the system call. Typically, the implementation of the ``ioctl`` operation simply contains a switch statement selecting the appropriate behavior depending on the value of the ``cmd`` argument. Different commands are represented by different numbers, which are usually given names using preprocessor definitions. The user program should be able to include a header file with such declarations (usually the same one that is used when compiling the driver module). It is the responsibility of the driver interface developer to determine the numerical values corresponding to the commands interpreted by the driver. A simple choice, assigning successive small values to individual commands, unfortunately is generally not a good solution. Commands should be unique across the system to avoid errors when a correct command is sent to an incorrect device. This situation may not occur very often, but its consequences can be serious. With different commands for all ``ioctl`` calls, in case of a mistake, ``-ENOTTY`` will be returned instead of performing an unintended action. The following macros (defined in ``linux/ioctl.h``) should be used when determining numerical values for commands: ``_IO(type, nr)`` general-purpose command (without an argument) ``_IOR(type, nr, dataitem)`` command with write to user space ``_IOW(type, nr, dataitem)`` command with read from user space ``_IOWR(type, nr, dataitem)`` command with write and read Designations: ``type`` unique number for the driver (8 bits, selected after reviewing ``Documentation/userspace-api/ioctl/ioctl-number.rst`` `[rendered] `_ ) – a magic number ``nr`` sequential command number (8 bits) ``dataitem`` structure associated with the command; the size of the given structure usually cannot be greater than 16kb-1 (depends on the value of ``_IOC_SIZEBITS``). Encoding the structure size written/read as a parameter can be helpful for detecting programs compiled with outdated driver versions and helps to avoid, for example, writing beyond the buffer. Example:: #define DN_SETCOUNT _IOR(0, 3, int) sysfs ===== In the kernel, there is often a need to grant access to certain device data to user space. Using character devices for this purpose is rather cumbersome – a character device is a fairly "heavy" object, and access to it is done through a limited ``read/write`` interface or an inconvenient ``ioctl`` interface. The first solution to these problems in Linux was the ``proc`` file system, allowing easy creation of a large number of special files for communication with the user. However, it had several drawbacks: primarily the lack of structure (everyone puts files wherever they like), and data transfer requires costly and delicate formatting and parsing of the byte stream. To solve the problems with the ``proc`` file system, the ``sysfs`` file system was created. It has the following features: - every device, driver, module, etc., in the system is based on a ``kobject`` structure and automatically receives a directory in ``sysfs`` - directories in ``sysfs`` are organized hierarchically – in the case of devices, each device is a subdirectory of the device it is connected to - relationships between objects are represented by symbolic links - object attributes are represented by files - it is mounted in ``/sys`` To grant a user access to some functionality, you need to get access to your ``kobject`` structure and attach attributes to it. In the case of devices, the ``kobject`` structure is the ``kobj`` field of the ``device`` structure. Adding Attributes to Devices ------------------------------- .. TODO: nieaktualne To add an attribute (i.e., a file representing a single parameter) to an object, you need to create a structure wrapping the ``attribute`` structure appropriate for the object type. In the case of devices, this is ``device_attribute``:: struct attribute { char *name; struct module *owner; mode_t mode; }; struct device_attribute { struct attribute attr; ssize_t (*show) (struct device *dev, char *buf); ssize_t (*store) (struct device *dev, const char *buf, size_t count); }; And add it to our device through:: int device_create_file(struct device *device, struct device_attribute *entry); void device_remove_file(struct device *dev, struct device_attribute *attr); The ``show`` function is called when an attribute is read by the user and has access to a ``PAGE_SIZE`` buffer to which it can write the attribute value (and return its size). The ``store`` function is called when an attribute is written and receives its complete value. Unlike the ``read``/``write`` functions, these functions always operate on the entire attribute value (no need to handle partial reads/writes). Binary Attributes ------------------ Sometimes simple attributes are not enough and it is necessary to export binary data, defining a custom implementation of ``read``/``write`` as with a regular file. This is served by binary attributes:: struct bin_attribute { struct attribute attr; size_t size; void *private; ssize_t (*read) (struct kobject *kobj, char *buf, loff_t off, size_t size); ssize_t (*write) (struct kobject *kobj, char *buf, loff_t off, size_t size); int (*mmap) (struct kobject *kobj, struct bin_attribute *attr, struct vm_area_struct *vma); }; int sysfs_create_bin_file(struct kobject *kobj, struct bin_attribute *attr); int sysfs_remove_bin_file(struct kobject *kobj, struct bin_attribute *attr); Directories ----------- For more complex devices, it may be useful to have the ability to create a tree-like structure of objects. In this case, it is necessary to create your own object type:: struct kobj_type { void (*release)(struct kobject *kobj); const struct sysfs_ops *sysfs_ops; struct attribute **default_attrs; }; struct sysfs_ops { ssize_t (*show)(struct kobject *kobj, struct attribute *attr, char *); ssize_t (*store)(struct kobject *kobj, struct attribute *attr, const char *, size_t); }; The ``release`` function will be called by the kernel when all references to the object disappear. ``sysfs_ops`` contains functions handling attributes - e.g., in the case of ``struct device``, these are simply functions casting ``kobject`` to ``device``, ``attribute`` to ``device_attribute``, and calling ``show``/``store`` from the given attribute. ``default_attrs`` is a pointer to an array of pointers to attributes (terminated by ``NULL``), which will be added to the objects of the given type upon creation. To create a new object, use the function:: int kobject_init_and_add( struct kobject *kobj, struct kobj_type *ktype, struct kobject *parent, const char *fmt, ...); Before calling it, you should initialize kobj with zeros. To remove it (or rather, get rid of your reference - the object will be deleted when all references disappear), use:: void kobject_put(struct kobject *kobj); We can also duplicate our reference by:: struct kobject *kobject_get(struct kobject *kobj); To add attributes to such a file after its creation:: int sysfs_create_file(struct kobject *kobj, const struct attribute *attr); void sysfs_remove_file(struct kobject *kobj, const struct attribute *attr); Relationships between Objects ----------------------------- Creating symbolic links in sysfs is possible through the functions:: int sysfs_create_link(struct kobject *kobj, struct kobject *target, char *name); void sysfs_remove_link(struct kobject *kobj, char *name); Useful Macros --------------- Instead of manually creating ``*_attribute`` structures, you can use macros to simplify the writing process when creating attributes:: static DEVICE_ATTR(foo, S_IWUSR | S_IRUGO, show_foo, store_foo); This is the same as manual declaration:: static struct device_attribute dev_attr_foo = { .attr = { .name = "foo", .mode = S_IWUSR | S_IRUGO, }, .show = show_foo, .store = store_foo, }; Going further down this path, you can do even more:: static DEVICE_ATTR_RW(foo_rw); static DEVICE_ATTR_RO(foo_ro); static DEVICE_ATTR_WO(foo_wo); These macros assume that the functions are named accordingly, ``foo_rw_show``, ``foo_rw_store``, ``foo_ro_show``, and ``foo_wo_store``. For simple types, there are also versions with already implemented ``_show`` and ``_store`` functions that operate on the indicated variable:: static ulong var_ulong; static int var_int; static bool var_bool; static DEVICE_ULONG_ATTR(foo_ulong, mode, var_ulong); static DEVICE_INT_ATTR(foo_int, mode, var_int); static DEVICE_BOOL_ATTR(foo_bool, mode, var_bool); To pass the address of a variable, they use an additional structure wrapping ``device_attribute``:: struct dev_ext_attribute { struct device_attribute attr; void *var; }; .. _small_task_8: Small Task #8 ============= The example driver attached to the materials unfortunately does not allow changing the greeting, causing problems when using it in non-English-speaking countries. Add the ability to change the greeting by the system administrator to the driver: - add a parameter ``bufsize`` to the module, which is the maximum size of the greeting - at module startup, allocate (using ``kmalloc``) a buffer of the given size, fill it with the default greeting ``"Hello, world!\n"``, and use it instead of the current static buffer - add support for the ``write`` operation, which will write to this buffer - data should be written to the position currently indicated by the file descriptor - after writing, the final position in the file should be remembered and considered as the current size of the greeting (for the purpose of reading) Moreover, add a sysfs file that will read-only display the current value of ``hello_repeats``. Literature ========== 1. `Driver APIs general documentation `_, especially: - `Driver Model `_, mainly "The Basic Device Structure" and "The Linux Kernel Device Model" for this class - `ioctl based interfaces `_ - `sysfs - _The_ filesystem for exporting kernel objects `_ 2. `Char devices API `_ 3. A. Rubini, J. Corbet, G. Kroah-Hartman, Linux Device Drivers, 3rd edition, O'Reilly, 2005. (http://lwn.net/Kernel/LDD3/) 4. Books listed on the course website: http://students.mimuw.edu.pl/ZSO/ .. ============================================================================= Autor: Grzegorz Marczyński (g.marczynski@mimuw.edu.pl) Aktualizacja: 2003-03-13 Aktualizacja: 2004-10-20 Stanisław Paśko (sp@mimuw.edu.pl) Aktualizacja: 2005-10-22 Piotr Malinowski (malinex@mimuw.edu.pl) Aktualizacja: 2006-11-16 Radek Bartosiak (kedar@mimuw.edu.pl) - linux 2.6 Aktualizacja: 2012-03-18 Marcelina Kościelnicka (mwk@mimuw.edu.pl) Aktualizacja: 2013-03-25 Marcelina Kościelnicka (mwk@mimuw.edu.pl) Tłumaczenie: 2025-05 Gemma 3 z poprawkami by m.matraszek@mimuw.edu.pl =============================================================================