Class 9: Character Devices¶
Date: 29.04.2025
Additional Materials¶
About Devices and Drivers in Linux¶
A device driver is a collection of code that supports a device (usually hardware, but virtual devices exist) and exports a set of functions allowing the device to be used by the user. The driver almost always has exclusive direct access to the device. It is often responsible for reconciling the needs of different users (although this is often handled by higher layers -- the file system can be considered such a layer for a hard drive).
Device drivers are usually kernel modules. If the device is not too "close" to the processor, it is also possible to write a driver in user space (USB devices are great for this), but we will not cover this in class.
To make a driver's functionality available to user programs, you must use one of many possible communication mechanisms with the kernel. The most common are:
block device file -- used for hard drives and sufficiently similar creations (floppy disk, CD-ROM, SSD, ...). The driver exposes read and write functions for blocks, and the block layer handles requests from the user (buffering, queuing, etc.).
character device file -- used for most types of devices. The driver simply provides functions corresponding to system calls that operate on files -- the kernel passes these calls directly to the driver, allowing the implementation of any interface.
network interface -- the driver exposes functions for sending and receiving packets, to which the network subsystem connects. The user can use it through socket calls.
file in
proc-- used in the case of drivers with a trivial interface (e.g., the entire driver functionality is reading/writing a single parameter).file in
sysfs-- as above, but newer (and simpler) interface.
File Representation¶
Using character and block devices involves creating
a corresponding special file somewhere in the file system (almost always
/dev) and opening it. This special file is only a "gateway to the kernel"
and the only information about it stored in the file system is:
a flag indicating the device type (
b-- block,c-- character)a major device number
a minor device number
Typically, the major number chooses the driver, and the minor number chooses a specific instance of the device supported by that driver. However, there are cases where a single major number is shared by many drivers if they only export one device.
At the C language level, both numbers are packed into a single number of type dev_t.
The following macros are useful (linux/kdev_t.h):
int MAJOR(dev_t number)returns the major device number.
int MINOR(dev_t number)returns the minor device number.
dev_t MKDEV(int major, int minor)Packs numbers into
dev_ttype.
To create a device file manually, you can use the following command:
mknod /dev/filename type major minor
When typing ls -l we will see the device numbers where the file size is usually located.
The kernel exports information about available
device drivers live in the sysfs file system, and the udevd program
(or systemd-udevd) monitors this information continuously and creates
appropriate files in /dev. Since there is no need to pre-establish
the used numbers, they are allocated dynamically. To make this mechanism
work, the driver must register its device in the sysfs hierarchy.
Character Device Drivers in Linux¶
Registering Device Drivers¶
Registering a driver involves several steps:
Allocate a range of device numbers:
int alloc_chrdev_region(dev_t *dev, unsigned baseminor, unsigned count, const char *name); void unregister_chrdev_region(dev_t first, unsigned int count);
This allocation is usually called only once, in the initialization function, allocating the entire range of numbers as a precaution.
Prepare the
file_operationsstructure describing the operations on our device. Such structures are usually global (there is no point in allocating them dynamically).Create and fill the
cdevstructure:void cdev_init(struct cdev *cdev, const struct file_operations *fops);
You can also request a dynamic allocation of the structure:
struct cdev *cdev_alloc(void);
In this case, you must manually fill the
opsfield with a pointer to our structure (do not mix this call withcdev_init).Register our
cdevstructure:int cdev_add(struct cdev *p, dev_t dev, unsigned count); void cdev_del(struct cdev *p);
At this point, our device becomes available to user space when someone opens the appropriate special file.
We can attach devices individually or as a range (
countparameter).If the
cdevstructure was created bycdev_alloc, the structure will be automatically released bycdev_del. If, on the other hand, it was initialized bycdev_init, releasing it is the driver's responsibility.Note that
cdev_delonly detaches the device from the device array, but does not guarantee that no one is using it anymore - previously opened file descriptors will still work (although if we are inmodule_exit, we are guaranteed that there are no such descriptors). In the case of implementing e.g. a device that should support hot-unplug, you must ensure this yourself e.g. by counting references.Register the device class in
sysfs(or use an existing one if it fits):struct class my_class = { .name = "abc", }; int class_register(struct class *class); void class_unregister(struct class *class);
This is only done once for all our devices (or for the device type if we have many).
Register our device in
sysfs:struct device *device_create(struct class *cls, struct device *parent, dev_t devt, void *drvdata, const char *fmt, ...); void device_destroy(struct class *cls, dev_t devt);
parentpoints to the device to which our device is connected -- the directory in sysfs corresponding to our device will be a subdirectory of the directory of the specified device. For character device drivers corresponding to e.g. PCI devices, the parent will be set to thedevfield of thepci_devicestructure. You can set this parameter toNULLto receive a top-level device.drvdatacan be used to store additional private information for our driver (useful if e.g. we want to create files in sysfs to control our device).fmtand subsequent parameters are passed tosprintfto create the device name that will appear in/dev.At this point,
udevdwill receive a notification about the new device and (in finite time) create the appropriate file in/dev.
The file_operations structure¶
The file_operations structure (defined in linux/fs.h) describes how
to perform operations on a given file. Every file (and in general everything that can
be an open file descriptor) in Linux has such a structure -- for ordinary
files, it is provided by the file system driver. For character devices,
it is provided by the device driver. It has many fields
(corresponding to operations), the most important of which are:
struct file_operations {
struct module *owner;
loff_t (*llseek) (struct file *, loff_t, int);
ssize_t (*read) (struct file *, char __user *, size_t, loff_t *);
ssize_t (*write) (struct file *, const char __user *, size_t,
loff_t *);
int (*unlocked_ioctl) (struct file *, unsigned int,
unsigned long);
int (*compat_ioctl) (struct file *, unsigned int,
unsigned long);
int (*mmap) (struct file *, struct vm_area_struct *);
int (*open) (struct inode *, struct file *);
int (*release) (struct inode *, struct file *);
/* ... */
};
We must fill the owner field with a pointer THIS_MODULE --
this allows the kernel to automatically manage the module's reference counter.
Fields of the file structure¶
The file structure (defined in linux/fs.h) represents an open file within the kernel.
It is created by the kernel when open is called and passed to all operations
on the file until the last close call (i.e., when release is called).
It is worth noting that an open file (the file structure) is a different thing
than a file on disk (represented by the inode structure).
The most important fields of this structure are:
struct file {
mode_t f_mode;
loff_t f_pos;
unsigned int f_flags;
struct file_operations *f_op;
void *private_data;
/* ... */
};
The f_mode field allows you to determine whether the file is open for
reading (FMODE_READ), writing (FMODE_WRITE), or both.
You do not need to check this field in the read and write functions,
because the kernel performs this test before calling the driver's function.
The f_pos field specifies the position for writing or reading
(used by read, write, lseek, etc.).
The f_flags flags are mainly used to check whether the operation
should be blocking or not (O_NONBLOCK), although it contains many more flags.
The f_op field specifies a set of functions that implement file operations.
This field is set (to operations from the cdev structure) by the kernel
when open is called, and then it is used for all subsequent operations
(the driver can replace the value of this field in open to choose
an alternative set of functions).
The private_data pointer is set to NULL when the file is opened.
The driver can use this pointer for its own purposes
(in which case it is responsible for freeing the memory allocated for this field).
The open operation -- opening a file¶
The prototype of this function looks as follows:
int open(struct inode *inode, struct file *filp)
The open operation allows the driver to perform preparatory operations before other operations.
Usually, the following steps are performed:
check for errors related to the device (e.g., check if the device is ready);
initialize the device if it is being opened for the first time and we are using lazy initialization;
identify the minor number (
MINOR(inode->i_rdev)) and, if necessary, replace the set of operations pointed to byf_op;allocate memory for data related to the device, initialize the data structures, and assign the
private_datapointer;
The release operation -- closing a file¶
The kernel keeps a reference counter for each existing file structure.
It can be increased, for example, by calling dup or inheriting an open file by fork.
When this counter finally falls to 0
(close is called on the last descriptor, or the process holding that descriptor calls exit),
the release function is called, serving as the file's destructor:
int release(struct inode *inode, struct file *filp)
Its task is to free the resources allocated in the open operation
and perform similar cleanup operations:
free the
private_datamemory;turn off the device when it is the last
releasecall;
read and write operations -- data transfer¶
The prototype of this function looks as follows:
ssize_t read(struct file *filp, char __user *buff, size_t count,
loff_t *offp)
ssize_t write(struct file *filp, const char __user *buff, size_t count,
loff_t *offp)
The task of the read operation is to copy a portion of data from the
kernel address space to a specified address (buff) in the user address
space. The write operation works in the opposite direction. These
functions are used to implement many system calls (read, pread,
readv, ...).
offp is a pointer to the current position in the file. If such a
position makes sense for our file, we take it from there and write the
updated position value there. For a regular read and write, it will
be a pointer to filp->f_pos, and for pread and pwrite it will
be a pointer to a variable on the stack containing the system call
parameter.
The value returned by this function will be interpreted as follows:
a value greater than zero indicates the number of bytes copied; if it is equal to the value of the argument passed to the system call, it indicates complete success; if it is smaller, it means that only part of the data was transferred - then it should be expected that the program will repeat the system call (e.g., this is the standard behavior of the library functions
fread/fwrite)if the value is equal to
0, the end of the file has been reached (used only inread)a negative value indicates an error
When implementing these operations, remember to maintain the correct semantics - returning an error means that no bytes were read/written. If our driver detects an error only after a certain amount of transferred bytes (and there is no easy way to undo the transfer), return the number of transferred bytes instead of an error - the error code will be returned when the user repeats the operation for the remaining bytes.
llseek operation -- changing the file position¶
The prototype of this function looks as follows:
loff_t llseek(struct file *filp, loff_t off, int whence)
The llseek operation implements the system calls lseek and
llseek. The default kernel behavior when the llseek operation is
not specified in the driver's operations is to change the f_pos field
of the file structure. If the concept of changing the file position does
not make sense for the device, write a function that returns an error
here. A ready-made function no_llseek is available in the kernel
for this purpose, which always returns -ESPIPE.
Operation ioctl – invoking device-specific commands¶
Function prototypes look as follows:
long (*unlocked_ioctl) (struct file *filp, unsigned int cmd,
unsigned long arg);
long (*compat_ioctl) (struct file *filp, unsigned int cmd,
unsigned long arg);
The unlocked_ioctl function handles ioctl calls from the "main"
kernel architecture. The name is a historical artifact from the big
kernel lock era -– drivers requiring the big kernel lock used to fill
the ioctl field, while newer or converted drivers using their own
locks used unlocked_ioctl. In current kernel versions, the big
kernel lock and the ioctl field no longer exist.
The compat_ioctl function handles ioctl calls from user programs
in compatibility mode with the 32-bit architecture version –
e.g., programs for the i386 architecture under a kernel for the
x86_64 architecture. If the structures passed through ioctl do not
contain fields of architecture-dependent size, both fields can be set
to the same function.
The first argument corresponds to the file descriptor passed by the
system call. The cmd argument is exactly the same as in the
system call. The optional arg argument is passed as an
unsigned long number regardless of the type used in the system
call.
Typically, the implementation of the ioctl operation simply
contains a switch statement selecting the appropriate behavior depending
on the value of the cmd argument. Different commands are
represented by different numbers, which are usually given names using
preprocessor definitions. The user program should be able to include
a header file with such declarations (usually the same one that is used when
compiling the driver module).
It is the responsibility of the driver interface developer to determine
the numerical values corresponding to the commands interpreted by the
driver. A simple choice, assigning successive small values to individual
commands, unfortunately is generally not a good solution. Commands
should be unique across the system to avoid errors when a correct
command is sent to an incorrect device. This situation may not occur
very often, but its consequences can be serious. With different
commands for all ioctl calls, in case of a mistake, -ENOTTY
will be returned instead of performing an unintended action.
The following macros (defined in linux/ioctl.h) should be used when
determining numerical values for commands:
_IO(type, nr)general-purpose command (without an argument)
_IOR(type, nr, dataitem)command with write to user space
_IOW(type, nr, dataitem)command with read from user space
_IOWR(type, nr, dataitem)command with write and read
Designations:
typeunique number for the driver (8 bits, selected after reviewing
Documentation/userspace-api/ioctl/ioctl-number.rst[rendered] ) – a magic numbernrsequential command number (8 bits)
dataitemstructure associated with the command; the size of the given structure usually cannot be greater than 16kb-1 (depends on the value of
_IOC_SIZEBITS). Encoding the structure size written/read as a parameter can be helpful for detecting programs compiled with outdated driver versions and helps to avoid, for example, writing beyond the buffer.
Example:
#define DN_SETCOUNT _IOR(0, 3, int)
sysfs¶
In the kernel, there is often a need to grant access to certain device
data to user space. Using character devices for this purpose is rather
cumbersome – a character device is a fairly "heavy" object, and access
to it is done through a limited read/write interface or an
inconvenient ioctl interface.
The first solution to these problems in Linux was the proc file system,
allowing easy creation of a large number of special files for
communication with the user. However, it had several drawbacks: primarily
the lack of structure (everyone puts files wherever they like), and data
transfer requires costly and delicate formatting and parsing of the byte
stream.
To solve the problems with the proc file system, the sysfs file
system was created. It has the following features:
every device, driver, module, etc., in the system is based on a
kobjectstructure and automatically receives a directory insysfsdirectories in
sysfsare organized hierarchically – in the case of devices, each device is a subdirectory of the device it is connected torelationships between objects are represented by symbolic links
object attributes are represented by files
it is mounted in
/sys
To grant a user access to some functionality, you need to get access to
your kobject structure and attach attributes to it. In the case of
devices, the kobject structure is the kobj field of the
device structure.
Adding Attributes to Devices¶
To add an attribute (i.e., a file representing a single parameter) to
an object, you need to create a structure wrapping the attribute
structure appropriate for the object type. In the case of devices, this
is device_attribute:
struct attribute {
char *name;
struct module *owner;
mode_t mode;
};
struct device_attribute {
struct attribute attr;
ssize_t (*show) (struct device *dev, char *buf);
ssize_t (*store) (struct device *dev, const char *buf, size_t count);
};
And add it to our device through:
int device_create_file(struct device *device, struct device_attribute *entry);
void device_remove_file(struct device *dev, struct device_attribute *attr);
The show function is called when an attribute is read by the user
and has access to a PAGE_SIZE buffer to which it can write the
attribute value (and return its size). The store function is called
when an attribute is written and receives its complete value. Unlike
the read/write functions, these functions always operate on the
entire attribute value (no need to handle partial reads/writes).
Binary Attributes¶
Sometimes simple attributes are not enough and it is necessary to export
binary data, defining a custom implementation of read/write as
with a regular file. This is served by binary attributes:
struct bin_attribute {
struct attribute attr;
size_t size;
void *private;
ssize_t (*read) (struct kobject *kobj, char *buf, loff_t off, size_t size);
ssize_t (*write) (struct kobject *kobj, char *buf, loff_t off, size_t size);
int (*mmap) (struct kobject *kobj, struct bin_attribute *attr, struct vm_area_struct *vma);
};
int sysfs_create_bin_file(struct kobject *kobj, struct bin_attribute *attr);
int sysfs_remove_bin_file(struct kobject *kobj, struct bin_attribute *attr);
Directories¶
For more complex devices, it may be useful to have the ability to create a tree-like structure of objects. In this case, it is necessary to create your own object type:
struct kobj_type {
void (*release)(struct kobject *kobj);
const struct sysfs_ops *sysfs_ops;
struct attribute **default_attrs;
};
struct sysfs_ops {
ssize_t (*show)(struct kobject *kobj, struct attribute *attr, char *);
ssize_t (*store)(struct kobject *kobj, struct attribute *attr, const char *, size_t);
};
The release function will be called by the kernel when all
references to the object disappear. sysfs_ops contains functions
handling attributes - e.g., in the case of struct device, these are
simply functions casting kobject to device, attribute to
device_attribute, and calling show/store from the given
attribute. default_attrs is a pointer to an array of pointers to
attributes (terminated by NULL), which will be added to the objects
of the given type upon creation.
To create a new object, use the function:
int kobject_init_and_add(
struct kobject *kobj,
struct kobj_type *ktype,
struct kobject *parent,
const char *fmt, ...);
Before calling it, you should initialize kobj with zeros.
To remove it (or rather, get rid of your reference - the object will be deleted when all references disappear), use:
void kobject_put(struct kobject *kobj);
We can also duplicate our reference by:
struct kobject *kobject_get(struct kobject *kobj);
To add attributes to such a file after its creation:
int sysfs_create_file(struct kobject *kobj, const struct attribute *attr);
void sysfs_remove_file(struct kobject *kobj, const struct attribute *attr);
Relationships between Objects¶
Creating symbolic links in sysfs is possible through the functions:
int sysfs_create_link(struct kobject *kobj, struct kobject *target, char *name);
void sysfs_remove_link(struct kobject *kobj, char *name);
Useful Macros¶
Instead of manually creating *_attribute structures, you can use
macros to simplify the writing process when creating attributes:
static DEVICE_ATTR(foo, S_IWUSR | S_IRUGO, show_foo, store_foo);
This is the same as manual declaration:
static struct device_attribute dev_attr_foo = {
.attr = {
.name = "foo",
.mode = S_IWUSR | S_IRUGO,
},
.show = show_foo,
.store = store_foo,
};
Going further down this path, you can do even more:
static DEVICE_ATTR_RW(foo_rw);
static DEVICE_ATTR_RO(foo_ro);
static DEVICE_ATTR_WO(foo_wo);
These macros assume that the functions are named accordingly,
foo_rw_show, foo_rw_store, foo_ro_show, and foo_wo_store.
For simple types, there are also versions with already implemented
_show and _store functions that operate on the indicated
variable:
static ulong var_ulong;
static int var_int;
static bool var_bool;
static DEVICE_ULONG_ATTR(foo_ulong, mode, var_ulong);
static DEVICE_INT_ATTR(foo_int, mode, var_int);
static DEVICE_BOOL_ATTR(foo_bool, mode, var_bool);
To pass the address of a variable, they use an additional structure
wrapping device_attribute:
struct dev_ext_attribute {
struct device_attribute attr;
void *var;
};
Small Task #8¶
The example driver attached to the materials unfortunately does not allow changing the greeting, causing problems when using it in non-English-speaking countries.
Add the ability to change the greeting by the system administrator to the driver:
add a parameter
bufsizeto the module, which is the maximum size of the greetingat module startup, allocate (using
kmalloc) a buffer of the given size, fill it with the default greeting"Hello, world!\n", and use it instead of the current static bufferadd support for the
writeoperation, which will write to this bufferdata should be written to the position currently indicated by the file descriptor
after writing, the final position in the file should be remembered and considered as the current size of the greeting (for the purpose of reading)
Moreover, add a sysfs file that will read-only display the current value of hello_repeats.
Literature¶
Driver APIs general documentation, especially:
Driver Model, mainly "The Basic Device Structure" and "The Linux Kernel Device Model" for this class
A. Rubini, J. Corbet, G. Kroah-Hartman, Linux Device Drivers, 3rd edition, O'Reilly, 2005. (http://lwn.net/Kernel/LDD3/)
Books listed on the course website: http://students.mimuw.edu.pl/ZSO/