Class 9: Character Devices

Date: 29.04.2025

Additional Materials

About Devices and Drivers in Linux

A device driver is a collection of code that supports a device (usually hardware, but virtual devices exist) and exports a set of functions allowing the device to be used by the user. The driver almost always has exclusive direct access to the device. It is often responsible for reconciling the needs of different users (although this is often handled by higher layers -- the file system can be considered such a layer for a hard drive).

Device drivers are usually kernel modules. If the device is not too "close" to the processor, it is also possible to write a driver in user space (USB devices are great for this), but we will not cover this in class.

To make a driver's functionality available to user programs, you must use one of many possible communication mechanisms with the kernel. The most common are:

  • block device file -- used for hard drives and sufficiently similar creations (floppy disk, CD-ROM, SSD, ...). The driver exposes read and write functions for blocks, and the block layer handles requests from the user (buffering, queuing, etc.).

  • character device file -- used for most types of devices. The driver simply provides functions corresponding to system calls that operate on files -- the kernel passes these calls directly to the driver, allowing the implementation of any interface.

  • network interface -- the driver exposes functions for sending and receiving packets, to which the network subsystem connects. The user can use it through socket calls.

  • file in proc -- used in the case of drivers with a trivial interface (e.g., the entire driver functionality is reading/writing a single parameter).

  • file in sysfs -- as above, but newer (and simpler) interface.

File Representation

Using character and block devices involves creating a corresponding special file somewhere in the file system (almost always /dev) and opening it. This special file is only a "gateway to the kernel" and the only information about it stored in the file system is:

  • a flag indicating the device type (b -- block, c -- character)

  • a major device number

  • a minor device number

Typically, the major number chooses the driver, and the minor number chooses a specific instance of the device supported by that driver. However, there are cases where a single major number is shared by many drivers if they only export one device.

At the C language level, both numbers are packed into a single number of type dev_t. The following macros are useful (linux/kdev_t.h):

int MAJOR(dev_t number)

returns the major device number.

int MINOR(dev_t number)

returns the minor device number.

dev_t MKDEV(int major, int minor)

Packs numbers into dev_t type.

To create a device file manually, you can use the following command:

mknod /dev/filename type major minor

When typing ls -l we will see the device numbers where the file size is usually located.

The kernel exports information about available device drivers live in the sysfs file system, and the udevd program (or systemd-udevd) monitors this information continuously and creates appropriate files in /dev. Since there is no need to pre-establish the used numbers, they are allocated dynamically. To make this mechanism work, the driver must register its device in the sysfs hierarchy.

Character Device Drivers in Linux

Registering Device Drivers

Registering a driver involves several steps:

  1. Allocate a range of device numbers:

    int alloc_chrdev_region(dev_t *dev, unsigned baseminor, unsigned count, const char *name);
    void unregister_chrdev_region(dev_t first, unsigned int count);
    

    This allocation is usually called only once, in the initialization function, allocating the entire range of numbers as a precaution.

  2. Prepare the file_operations structure describing the operations on our device. Such structures are usually global (there is no point in allocating them dynamically).

  3. Create and fill the cdev structure:

    void cdev_init(struct cdev *cdev, const struct file_operations *fops);
    

    You can also request a dynamic allocation of the structure:

    struct cdev *cdev_alloc(void);
    

    In this case, you must manually fill the ops field with a pointer to our structure (do not mix this call with cdev_init).

  4. Register our cdev structure:

    int cdev_add(struct cdev *p, dev_t dev, unsigned count);
    void cdev_del(struct cdev *p);
    

    At this point, our device becomes available to user space when someone opens the appropriate special file.

    We can attach devices individually or as a range (count parameter).

    If the cdev structure was created by cdev_alloc, the structure will be automatically released by cdev_del. If, on the other hand, it was initialized by cdev_init, releasing it is the driver's responsibility.

    Note that cdev_del only detaches the device from the device array, but does not guarantee that no one is using it anymore - previously opened file descriptors will still work (although if we are in module_exit, we are guaranteed that there are no such descriptors). In the case of implementing e.g. a device that should support hot-unplug, you must ensure this yourself e.g. by counting references.

  5. Register the device class in sysfs (or use an existing one if it fits):

    struct class my_class = {
        .name = "abc",
    };
    int class_register(struct class *class);
    void class_unregister(struct class *class);
    

    This is only done once for all our devices (or for the device type if we have many).

  6. Register our device in sysfs:

    struct device *device_create(struct class *cls, struct device *parent,
                   dev_t devt, void *drvdata, const char *fmt, ...);
    void device_destroy(struct class *cls, dev_t devt);
    

    parent points to the device to which our device is connected -- the directory in sysfs corresponding to our device will be a subdirectory of the directory of the specified device. For character device drivers corresponding to e.g. PCI devices, the parent will be set to the dev field of the pci_device structure. You can set this parameter to NULL to receive a top-level device. drvdata can be used to store additional private information for our driver (useful if e.g. we want to create files in sysfs to control our device). fmt and subsequent parameters are passed to sprintf to create the device name that will appear in /dev.

    At this point, udevd will receive a notification about the new device and (in finite time) create the appropriate file in /dev.

The file_operations structure

The file_operations structure (defined in linux/fs.h) describes how to perform operations on a given file. Every file (and in general everything that can be an open file descriptor) in Linux has such a structure -- for ordinary files, it is provided by the file system driver. For character devices, it is provided by the device driver. It has many fields (corresponding to operations), the most important of which are:

struct file_operations {
    struct module *owner;
    loff_t (*llseek) (struct file *, loff_t, int);
    ssize_t (*read) (struct file *, char __user *, size_t, loff_t *);
    ssize_t (*write) (struct file *, const char __user *, size_t,
        loff_t *);
    int (*unlocked_ioctl) (struct file *, unsigned int,
        unsigned long);
    int (*compat_ioctl) (struct file *, unsigned int,
        unsigned long);
    int (*mmap) (struct file *, struct vm_area_struct *);
    int (*open) (struct inode *, struct file *);
    int (*release) (struct inode *, struct file *);
    /* ... */
};

We must fill the owner field with a pointer THIS_MODULE -- this allows the kernel to automatically manage the module's reference counter.

Fields of the file structure

The file structure (defined in linux/fs.h) represents an open file within the kernel. It is created by the kernel when open is called and passed to all operations on the file until the last close call (i.e., when release is called). It is worth noting that an open file (the file structure) is a different thing than a file on disk (represented by the inode structure). The most important fields of this structure are:

struct file {
    mode_t                  f_mode;
    loff_t                  f_pos;
    unsigned int            f_flags;
    struct file_operations  *f_op;
    void                    *private_data;

    /* ... */
};

The f_mode field allows you to determine whether the file is open for reading (FMODE_READ), writing (FMODE_WRITE), or both. You do not need to check this field in the read and write functions, because the kernel performs this test before calling the driver's function.

The f_pos field specifies the position for writing or reading (used by read, write, lseek, etc.).

The f_flags flags are mainly used to check whether the operation should be blocking or not (O_NONBLOCK), although it contains many more flags.

The f_op field specifies a set of functions that implement file operations. This field is set (to operations from the cdev structure) by the kernel when open is called, and then it is used for all subsequent operations (the driver can replace the value of this field in open to choose an alternative set of functions).

The private_data pointer is set to NULL when the file is opened. The driver can use this pointer for its own purposes (in which case it is responsible for freeing the memory allocated for this field).

The open operation -- opening a file

The prototype of this function looks as follows:

int open(struct inode *inode, struct file *filp)

The open operation allows the driver to perform preparatory operations before other operations. Usually, the following steps are performed:

  • check for errors related to the device (e.g., check if the device is ready);

  • initialize the device if it is being opened for the first time and we are using lazy initialization;

  • identify the minor number (MINOR(inode->i_rdev)) and, if necessary, replace the set of operations pointed to by f_op;

  • allocate memory for data related to the device, initialize the data structures, and assign the private_data pointer;

The release operation -- closing a file

The kernel keeps a reference counter for each existing file structure. It can be increased, for example, by calling dup or inheriting an open file by fork. When this counter finally falls to 0 (close is called on the last descriptor, or the process holding that descriptor calls exit), the release function is called, serving as the file's destructor:

int release(struct inode *inode, struct file *filp)

Its task is to free the resources allocated in the open operation and perform similar cleanup operations:

  • free the private_data memory;

  • turn off the device when it is the last release call;

read and write operations -- data transfer

The prototype of this function looks as follows:

ssize_t read(struct file *filp, char __user *buff, size_t count,
             loff_t *offp)
ssize_t write(struct file *filp, const char __user *buff, size_t count,
                loff_t *offp)

The task of the read operation is to copy a portion of data from the kernel address space to a specified address (buff) in the user address space. The write operation works in the opposite direction. These functions are used to implement many system calls (read, pread, readv, ...).

offp is a pointer to the current position in the file. If such a position makes sense for our file, we take it from there and write the updated position value there. For a regular read and write, it will be a pointer to filp->f_pos, and for pread and pwrite it will be a pointer to a variable on the stack containing the system call parameter.

The value returned by this function will be interpreted as follows:

  • a value greater than zero indicates the number of bytes copied; if it is equal to the value of the argument passed to the system call, it indicates complete success; if it is smaller, it means that only part of the data was transferred - then it should be expected that the program will repeat the system call (e.g., this is the standard behavior of the library functions fread/fwrite)

  • if the value is equal to 0, the end of the file has been reached (used only in read)

  • a negative value indicates an error

When implementing these operations, remember to maintain the correct semantics - returning an error means that no bytes were read/written. If our driver detects an error only after a certain amount of transferred bytes (and there is no easy way to undo the transfer), return the number of transferred bytes instead of an error - the error code will be returned when the user repeats the operation for the remaining bytes.

llseek operation -- changing the file position

The prototype of this function looks as follows:

loff_t llseek(struct file *filp, loff_t off, int whence)

The llseek operation implements the system calls lseek and llseek. The default kernel behavior when the llseek operation is not specified in the driver's operations is to change the f_pos field of the file structure. If the concept of changing the file position does not make sense for the device, write a function that returns an error here. A ready-made function no_llseek is available in the kernel for this purpose, which always returns -ESPIPE.

Operation ioctl – invoking device-specific commands

Function prototypes look as follows:

long (*unlocked_ioctl) (struct file *filp, unsigned int cmd,
                unsigned long arg);
long (*compat_ioctl) (struct file *filp, unsigned int cmd,
                unsigned long arg);

The unlocked_ioctl function handles ioctl calls from the "main" kernel architecture. The name is a historical artifact from the big kernel lock era -– drivers requiring the big kernel lock used to fill the ioctl field, while newer or converted drivers using their own locks used unlocked_ioctl. In current kernel versions, the big kernel lock and the ioctl field no longer exist.

The compat_ioctl function handles ioctl calls from user programs in compatibility mode with the 32-bit architecture version – e.g., programs for the i386 architecture under a kernel for the x86_64 architecture. If the structures passed through ioctl do not contain fields of architecture-dependent size, both fields can be set to the same function.

The first argument corresponds to the file descriptor passed by the system call. The cmd argument is exactly the same as in the system call. The optional arg argument is passed as an unsigned long number regardless of the type used in the system call.

Typically, the implementation of the ioctl operation simply contains a switch statement selecting the appropriate behavior depending on the value of the cmd argument. Different commands are represented by different numbers, which are usually given names using preprocessor definitions. The user program should be able to include a header file with such declarations (usually the same one that is used when compiling the driver module).

It is the responsibility of the driver interface developer to determine the numerical values corresponding to the commands interpreted by the driver. A simple choice, assigning successive small values to individual commands, unfortunately is generally not a good solution. Commands should be unique across the system to avoid errors when a correct command is sent to an incorrect device. This situation may not occur very often, but its consequences can be serious. With different commands for all ioctl calls, in case of a mistake, -ENOTTY will be returned instead of performing an unintended action.

The following macros (defined in linux/ioctl.h) should be used when determining numerical values for commands:

_IO(type, nr)

general-purpose command (without an argument)

_IOR(type, nr, dataitem)

command with write to user space

_IOW(type, nr, dataitem)

command with read from user space

_IOWR(type, nr, dataitem)

command with write and read

Designations:

type

unique number for the driver (8 bits, selected after reviewing Documentation/userspace-api/ioctl/ioctl-number.rst [rendered] ) – a magic number

nr

sequential command number (8 bits)

dataitem

structure associated with the command; the size of the given structure usually cannot be greater than 16kb-1 (depends on the value of _IOC_SIZEBITS). Encoding the structure size written/read as a parameter can be helpful for detecting programs compiled with outdated driver versions and helps to avoid, for example, writing beyond the buffer.

Example:

#define DN_SETCOUNT    _IOR(0, 3, int)

sysfs

In the kernel, there is often a need to grant access to certain device data to user space. Using character devices for this purpose is rather cumbersome – a character device is a fairly "heavy" object, and access to it is done through a limited read/write interface or an inconvenient ioctl interface.

The first solution to these problems in Linux was the proc file system, allowing easy creation of a large number of special files for communication with the user. However, it had several drawbacks: primarily the lack of structure (everyone puts files wherever they like), and data transfer requires costly and delicate formatting and parsing of the byte stream.

To solve the problems with the proc file system, the sysfs file system was created. It has the following features:

  • every device, driver, module, etc., in the system is based on a kobject structure and automatically receives a directory in sysfs

  • directories in sysfs are organized hierarchically – in the case of devices, each device is a subdirectory of the device it is connected to

  • relationships between objects are represented by symbolic links

  • object attributes are represented by files

  • it is mounted in /sys

To grant a user access to some functionality, you need to get access to your kobject structure and attach attributes to it. In the case of devices, the kobject structure is the kobj field of the device structure.

Adding Attributes to Devices

To add an attribute (i.e., a file representing a single parameter) to an object, you need to create a structure wrapping the attribute structure appropriate for the object type. In the case of devices, this is device_attribute:

struct attribute {
    char *name;
    struct module *owner;
    mode_t mode;
};

struct device_attribute {
    struct attribute attr;
    ssize_t (*show) (struct device *dev, char *buf);
    ssize_t (*store) (struct device *dev, const char *buf, size_t count);
};

And add it to our device through:

int device_create_file(struct device *device, struct device_attribute *entry);
void device_remove_file(struct device *dev, struct device_attribute *attr);

The show function is called when an attribute is read by the user and has access to a PAGE_SIZE buffer to which it can write the attribute value (and return its size). The store function is called when an attribute is written and receives its complete value. Unlike the read/write functions, these functions always operate on the entire attribute value (no need to handle partial reads/writes).

Binary Attributes

Sometimes simple attributes are not enough and it is necessary to export binary data, defining a custom implementation of read/write as with a regular file. This is served by binary attributes:

struct bin_attribute {
    struct attribute attr;
    size_t size;
    void *private;
    ssize_t (*read) (struct kobject *kobj, char *buf, loff_t off, size_t size);
    ssize_t (*write) (struct kobject *kobj, char *buf, loff_t off, size_t size);
    int (*mmap) (struct kobject *kobj, struct bin_attribute *attr, struct vm_area_struct *vma);
};
int sysfs_create_bin_file(struct kobject *kobj, struct bin_attribute *attr);
int sysfs_remove_bin_file(struct kobject *kobj, struct bin_attribute *attr);

Directories

For more complex devices, it may be useful to have the ability to create a tree-like structure of objects. In this case, it is necessary to create your own object type:

struct kobj_type {
    void (*release)(struct kobject *kobj);
    const struct sysfs_ops *sysfs_ops;
    struct attribute **default_attrs;
};
struct sysfs_ops {
    ssize_t (*show)(struct kobject *kobj, struct attribute *attr, char *);
    ssize_t (*store)(struct kobject *kobj, struct attribute *attr, const char *, size_t);
};

The release function will be called by the kernel when all references to the object disappear. sysfs_ops contains functions handling attributes - e.g., in the case of struct device, these are simply functions casting kobject to device, attribute to device_attribute, and calling show/store from the given attribute. default_attrs is a pointer to an array of pointers to attributes (terminated by NULL), which will be added to the objects of the given type upon creation.

To create a new object, use the function:

int kobject_init_and_add(
        struct kobject *kobj,
        struct kobj_type *ktype,
        struct kobject *parent,
        const char *fmt, ...);

Before calling it, you should initialize kobj with zeros.

To remove it (or rather, get rid of your reference - the object will be deleted when all references disappear), use:

void kobject_put(struct kobject *kobj);

We can also duplicate our reference by:

struct kobject *kobject_get(struct kobject *kobj);

To add attributes to such a file after its creation:

int sysfs_create_file(struct kobject *kobj, const struct attribute *attr);
void sysfs_remove_file(struct kobject *kobj, const struct attribute *attr);

Relationships between Objects

Creating symbolic links in sysfs is possible through the functions:

int sysfs_create_link(struct kobject *kobj, struct kobject *target, char *name);
void sysfs_remove_link(struct kobject *kobj, char *name);

Useful Macros

Instead of manually creating *_attribute structures, you can use macros to simplify the writing process when creating attributes:

static DEVICE_ATTR(foo, S_IWUSR | S_IRUGO, show_foo, store_foo);

This is the same as manual declaration:

static struct device_attribute dev_attr_foo = {
    .attr = {
        .name = "foo",
        .mode = S_IWUSR | S_IRUGO,
    },
    .show = show_foo,
    .store = store_foo,
};

Going further down this path, you can do even more:

static DEVICE_ATTR_RW(foo_rw);
static DEVICE_ATTR_RO(foo_ro);
static DEVICE_ATTR_WO(foo_wo);

These macros assume that the functions are named accordingly, foo_rw_show, foo_rw_store, foo_ro_show, and foo_wo_store.

For simple types, there are also versions with already implemented _show and _store functions that operate on the indicated variable:

static ulong var_ulong;
static int var_int;
static bool var_bool;

static DEVICE_ULONG_ATTR(foo_ulong, mode, var_ulong);
static DEVICE_INT_ATTR(foo_int, mode, var_int);
static DEVICE_BOOL_ATTR(foo_bool, mode, var_bool);

To pass the address of a variable, they use an additional structure wrapping device_attribute:

struct dev_ext_attribute {
    struct device_attribute attr;
    void *var;
};

Small Task #8

The example driver attached to the materials unfortunately does not allow changing the greeting, causing problems when using it in non-English-speaking countries.

Add the ability to change the greeting by the system administrator to the driver:

  • add a parameter bufsize to the module, which is the maximum size of the greeting

  • at module startup, allocate (using kmalloc) a buffer of the given size, fill it with the default greeting "Hello, world!\n", and use it instead of the current static buffer

  • add support for the write operation, which will write to this buffer

    • data should be written to the position currently indicated by the file descriptor

    • after writing, the final position in the file should be remembered and considered as the current size of the greeting (for the purpose of reading)

Moreover, add a sysfs file that will read-only display the current value of hello_repeats.

Literature

  1. Driver APIs general documentation, especially:

  2. Char devices API

  3. A. Rubini, J. Corbet, G. Kroah-Hartman, Linux Device Drivers, 3rd edition, O'Reilly, 2005. (http://lwn.net/Kernel/LDD3/)

  4. Books listed on the course website: http://students.mimuw.edu.pl/ZSO/