Class 9: Character Devices¶
Date: 29.04.2025
Additional Materials¶
About Devices and Drivers in Linux¶
A device driver is a collection of code that supports a device (usually hardware, but virtual devices exist) and exports a set of functions allowing the device to be used by the user. The driver almost always has exclusive direct access to the device. It is often responsible for reconciling the needs of different users (although this is often handled by higher layers -- the file system can be considered such a layer for a hard drive).
Device drivers are usually kernel modules. If the device is not too "close" to the processor, it is also possible to write a driver in user space (USB devices are great for this), but we will not cover this in class.
To make a driver's functionality available to user programs, you must use one of many possible communication mechanisms with the kernel. The most common are:
block device file -- used for hard drives and sufficiently similar creations (floppy disk, CD-ROM, SSD, ...). The driver exposes read and write functions for blocks, and the block layer handles requests from the user (buffering, queuing, etc.).
character device file -- used for most types of devices. The driver simply provides functions corresponding to system calls that operate on files -- the kernel passes these calls directly to the driver, allowing the implementation of any interface.
network interface -- the driver exposes functions for sending and receiving packets, to which the network subsystem connects. The user can use it through socket calls.
file in
proc
-- used in the case of drivers with a trivial interface (e.g., the entire driver functionality is reading/writing a single parameter).file in
sysfs
-- as above, but newer (and simpler) interface.
File Representation¶
Using character and block devices involves creating
a corresponding special file somewhere in the file system (almost always
/dev
) and opening it. This special file is only a "gateway to the kernel"
and the only information about it stored in the file system is:
a flag indicating the device type (
b
-- block,c
-- character)a major device number
a minor device number
Typically, the major number chooses the driver, and the minor number chooses a specific instance of the device supported by that driver. However, there are cases where a single major number is shared by many drivers if they only export one device.
At the C language level, both numbers are packed into a single number of type dev_t
.
The following macros are useful (linux/kdev_t.h
):
int MAJOR(dev_t number)
returns the major device number.
int MINOR(dev_t number)
returns the minor device number.
dev_t MKDEV(int major, int minor)
Packs numbers into
dev_t
type.
To create a device file manually, you can use the following command:
mknod /dev/filename type major minor
When typing ls -l
we will see the device numbers where the file size is usually located.
The kernel exports information about available
device drivers live in the sysfs
file system, and the udevd
program
(or systemd-udevd
) monitors this information continuously and creates
appropriate files in /dev
. Since there is no need to pre-establish
the used numbers, they are allocated dynamically. To make this mechanism
work, the driver must register its device in the sysfs
hierarchy.
Character Device Drivers in Linux¶
Registering Device Drivers¶
Registering a driver involves several steps:
Allocate a range of device numbers:
int alloc_chrdev_region(dev_t *dev, unsigned baseminor, unsigned count, const char *name); void unregister_chrdev_region(dev_t first, unsigned int count);
This allocation is usually called only once, in the initialization function, allocating the entire range of numbers as a precaution.
Prepare the
file_operations
structure describing the operations on our device. Such structures are usually global (there is no point in allocating them dynamically).Create and fill the
cdev
structure:void cdev_init(struct cdev *cdev, const struct file_operations *fops);
You can also request a dynamic allocation of the structure:
struct cdev *cdev_alloc(void);
In this case, you must manually fill the
ops
field with a pointer to our structure (do not mix this call withcdev_init
).Register our
cdev
structure:int cdev_add(struct cdev *p, dev_t dev, unsigned count); void cdev_del(struct cdev *p);
At this point, our device becomes available to user space when someone opens the appropriate special file.
We can attach devices individually or as a range (
count
parameter).If the
cdev
structure was created bycdev_alloc
, the structure will be automatically released bycdev_del
. If, on the other hand, it was initialized bycdev_init
, releasing it is the driver's responsibility.Note that
cdev_del
only detaches the device from the device array, but does not guarantee that no one is using it anymore - previously opened file descriptors will still work (although if we are inmodule_exit
, we are guaranteed that there are no such descriptors). In the case of implementing e.g. a device that should support hot-unplug, you must ensure this yourself e.g. by counting references.Register the device class in
sysfs
(or use an existing one if it fits):struct class my_class = { .name = "abc", }; int class_register(struct class *class); void class_unregister(struct class *class);
This is only done once for all our devices (or for the device type if we have many).
Register our device in
sysfs
:struct device *device_create(struct class *cls, struct device *parent, dev_t devt, void *drvdata, const char *fmt, ...); void device_destroy(struct class *cls, dev_t devt);
parent
points to the device to which our device is connected -- the directory in sysfs corresponding to our device will be a subdirectory of the directory of the specified device. For character device drivers corresponding to e.g. PCI devices, the parent will be set to thedev
field of thepci_device
structure. You can set this parameter toNULL
to receive a top-level device.drvdata
can be used to store additional private information for our driver (useful if e.g. we want to create files in sysfs to control our device).fmt
and subsequent parameters are passed tosprintf
to create the device name that will appear in/dev
.At this point,
udevd
will receive a notification about the new device and (in finite time) create the appropriate file in/dev
.
The file_operations
structure¶
The file_operations
structure (defined in linux/fs.h
) describes how
to perform operations on a given file. Every file (and in general everything that can
be an open file descriptor) in Linux has such a structure -- for ordinary
files, it is provided by the file system driver. For character devices,
it is provided by the device driver. It has many fields
(corresponding to operations), the most important of which are:
struct file_operations {
struct module *owner;
loff_t (*llseek) (struct file *, loff_t, int);
ssize_t (*read) (struct file *, char __user *, size_t, loff_t *);
ssize_t (*write) (struct file *, const char __user *, size_t,
loff_t *);
int (*unlocked_ioctl) (struct file *, unsigned int,
unsigned long);
int (*compat_ioctl) (struct file *, unsigned int,
unsigned long);
int (*mmap) (struct file *, struct vm_area_struct *);
int (*open) (struct inode *, struct file *);
int (*release) (struct inode *, struct file *);
/* ... */
};
We must fill the owner
field with a pointer THIS_MODULE
--
this allows the kernel to automatically manage the module's reference counter.
Fields of the file
structure¶
The file
structure (defined in linux/fs.h
) represents an open file within the kernel.
It is created by the kernel when open
is called and passed to all operations
on the file until the last close
call (i.e., when release
is called).
It is worth noting that an open file (the file
structure) is a different thing
than a file on disk (represented by the inode
structure).
The most important fields of this structure are:
struct file {
mode_t f_mode;
loff_t f_pos;
unsigned int f_flags;
struct file_operations *f_op;
void *private_data;
/* ... */
};
The f_mode
field allows you to determine whether the file is open for
reading (FMODE_READ
), writing (FMODE_WRITE
), or both.
You do not need to check this field in the read
and write
functions,
because the kernel performs this test before calling the driver's function.
The f_pos
field specifies the position for writing or reading
(used by read
, write
, lseek
, etc.).
The f_flags
flags are mainly used to check whether the operation
should be blocking or not (O_NONBLOCK
), although it contains many more flags.
The f_op
field specifies a set of functions that implement file operations.
This field is set (to operations from the cdev
structure) by the kernel
when open
is called, and then it is used for all subsequent operations
(the driver can replace the value of this field in open
to choose
an alternative set of functions).
The private_data
pointer is set to NULL
when the file is opened.
The driver can use this pointer for its own purposes
(in which case it is responsible for freeing the memory allocated for this field).
The open
operation -- opening a file¶
The prototype of this function looks as follows:
int open(struct inode *inode, struct file *filp)
The open
operation allows the driver to perform preparatory operations before other operations.
Usually, the following steps are performed:
check for errors related to the device (e.g., check if the device is ready);
initialize the device if it is being opened for the first time and we are using lazy initialization;
identify the minor number (
MINOR(inode->i_rdev)
) and, if necessary, replace the set of operations pointed to byf_op
;allocate memory for data related to the device, initialize the data structures, and assign the
private_data
pointer;
The release
operation -- closing a file¶
The kernel keeps a reference counter for each existing file
structure.
It can be increased, for example, by calling dup
or inheriting an open file by fork
.
When this counter finally falls to 0
(close
is called on the last descriptor, or the process holding that descriptor calls exit
),
the release
function is called, serving as the file's destructor:
int release(struct inode *inode, struct file *filp)
Its task is to free the resources allocated in the open
operation
and perform similar cleanup operations:
free the
private_data
memory;turn off the device when it is the last
release
call;
read
and write
operations -- data transfer¶
The prototype of this function looks as follows:
ssize_t read(struct file *filp, char __user *buff, size_t count,
loff_t *offp)
ssize_t write(struct file *filp, const char __user *buff, size_t count,
loff_t *offp)
The task of the read
operation is to copy a portion of data from the
kernel address space to a specified address (buff
) in the user address
space. The write
operation works in the opposite direction. These
functions are used to implement many system calls (read
, pread
,
readv
, ...).
offp
is a pointer to the current position in the file. If such a
position makes sense for our file, we take it from there and write the
updated position value there. For a regular read
and write
, it will
be a pointer to filp->f_pos
, and for pread
and pwrite
it will
be a pointer to a variable on the stack containing the system call
parameter.
The value returned by this function will be interpreted as follows:
a value greater than zero indicates the number of bytes copied; if it is equal to the value of the argument passed to the system call, it indicates complete success; if it is smaller, it means that only part of the data was transferred - then it should be expected that the program will repeat the system call (e.g., this is the standard behavior of the library functions
fread
/fwrite
)if the value is equal to
0
, the end of the file has been reached (used only inread
)a negative value indicates an error
When implementing these operations, remember to maintain the correct semantics - returning an error means that no bytes were read/written. If our driver detects an error only after a certain amount of transferred bytes (and there is no easy way to undo the transfer), return the number of transferred bytes instead of an error - the error code will be returned when the user repeats the operation for the remaining bytes.
llseek
operation -- changing the file position¶
The prototype of this function looks as follows:
loff_t llseek(struct file *filp, loff_t off, int whence)
The llseek
operation implements the system calls lseek
and
llseek
. The default kernel behavior when the llseek
operation is
not specified in the driver's operations is to change the f_pos
field
of the file structure. If the concept of changing the file position does
not make sense for the device, write a function that returns an error
here. A ready-made function no_llseek
is available in the kernel
for this purpose, which always returns -ESPIPE
.
Operation ioctl
– invoking device-specific commands¶
Function prototypes look as follows:
long (*unlocked_ioctl) (struct file *filp, unsigned int cmd,
unsigned long arg);
long (*compat_ioctl) (struct file *filp, unsigned int cmd,
unsigned long arg);
The unlocked_ioctl
function handles ioctl
calls from the "main"
kernel architecture. The name is a historical artifact from the big
kernel lock era -– drivers requiring the big kernel lock used to fill
the ioctl
field, while newer or converted drivers using their own
locks used unlocked_ioctl
. In current kernel versions, the big
kernel lock and the ioctl
field no longer exist.
The compat_ioctl
function handles ioctl
calls from user programs
in compatibility mode with the 32-bit architecture version –
e.g., programs for the i386 architecture under a kernel for the
x86_64 architecture. If the structures passed through ioctl
do not
contain fields of architecture-dependent size, both fields can be set
to the same function.
The first argument corresponds to the file descriptor passed by the
system call. The cmd
argument is exactly the same as in the
system call. The optional arg
argument is passed as an
unsigned long
number regardless of the type used in the system
call.
Typically, the implementation of the ioctl
operation simply
contains a switch statement selecting the appropriate behavior depending
on the value of the cmd
argument. Different commands are
represented by different numbers, which are usually given names using
preprocessor definitions. The user program should be able to include
a header file with such declarations (usually the same one that is used when
compiling the driver module).
It is the responsibility of the driver interface developer to determine
the numerical values corresponding to the commands interpreted by the
driver. A simple choice, assigning successive small values to individual
commands, unfortunately is generally not a good solution. Commands
should be unique across the system to avoid errors when a correct
command is sent to an incorrect device. This situation may not occur
very often, but its consequences can be serious. With different
commands for all ioctl
calls, in case of a mistake, -ENOTTY
will be returned instead of performing an unintended action.
The following macros (defined in linux/ioctl.h
) should be used when
determining numerical values for commands:
_IO(type, nr)
general-purpose command (without an argument)
_IOR(type, nr, dataitem)
command with write to user space
_IOW(type, nr, dataitem)
command with read from user space
_IOWR(type, nr, dataitem)
command with write and read
Designations:
type
unique number for the driver (8 bits, selected after reviewing
Documentation/userspace-api/ioctl/ioctl-number.rst
[rendered] ) – a magic numbernr
sequential command number (8 bits)
dataitem
structure associated with the command; the size of the given structure usually cannot be greater than 16kb-1 (depends on the value of
_IOC_SIZEBITS
). Encoding the structure size written/read as a parameter can be helpful for detecting programs compiled with outdated driver versions and helps to avoid, for example, writing beyond the buffer.
Example:
#define DN_SETCOUNT _IOR(0, 3, int)
sysfs¶
In the kernel, there is often a need to grant access to certain device
data to user space. Using character devices for this purpose is rather
cumbersome – a character device is a fairly "heavy" object, and access
to it is done through a limited read/write
interface or an
inconvenient ioctl
interface.
The first solution to these problems in Linux was the proc
file system,
allowing easy creation of a large number of special files for
communication with the user. However, it had several drawbacks: primarily
the lack of structure (everyone puts files wherever they like), and data
transfer requires costly and delicate formatting and parsing of the byte
stream.
To solve the problems with the proc
file system, the sysfs
file
system was created. It has the following features:
every device, driver, module, etc., in the system is based on a
kobject
structure and automatically receives a directory insysfs
directories in
sysfs
are organized hierarchically – in the case of devices, each device is a subdirectory of the device it is connected torelationships between objects are represented by symbolic links
object attributes are represented by files
it is mounted in
/sys
To grant a user access to some functionality, you need to get access to
your kobject
structure and attach attributes to it. In the case of
devices, the kobject
structure is the kobj
field of the
device
structure.
Adding Attributes to Devices¶
To add an attribute (i.e., a file representing a single parameter) to
an object, you need to create a structure wrapping the attribute
structure appropriate for the object type. In the case of devices, this
is device_attribute
:
struct attribute {
char *name;
struct module *owner;
mode_t mode;
};
struct device_attribute {
struct attribute attr;
ssize_t (*show) (struct device *dev, char *buf);
ssize_t (*store) (struct device *dev, const char *buf, size_t count);
};
And add it to our device through:
int device_create_file(struct device *device, struct device_attribute *entry);
void device_remove_file(struct device *dev, struct device_attribute *attr);
The show
function is called when an attribute is read by the user
and has access to a PAGE_SIZE
buffer to which it can write the
attribute value (and return its size). The store
function is called
when an attribute is written and receives its complete value. Unlike
the read
/write
functions, these functions always operate on the
entire attribute value (no need to handle partial reads/writes).
Binary Attributes¶
Sometimes simple attributes are not enough and it is necessary to export
binary data, defining a custom implementation of read
/write
as
with a regular file. This is served by binary attributes:
struct bin_attribute {
struct attribute attr;
size_t size;
void *private;
ssize_t (*read) (struct kobject *kobj, char *buf, loff_t off, size_t size);
ssize_t (*write) (struct kobject *kobj, char *buf, loff_t off, size_t size);
int (*mmap) (struct kobject *kobj, struct bin_attribute *attr, struct vm_area_struct *vma);
};
int sysfs_create_bin_file(struct kobject *kobj, struct bin_attribute *attr);
int sysfs_remove_bin_file(struct kobject *kobj, struct bin_attribute *attr);
Directories¶
For more complex devices, it may be useful to have the ability to create a tree-like structure of objects. In this case, it is necessary to create your own object type:
struct kobj_type {
void (*release)(struct kobject *kobj);
const struct sysfs_ops *sysfs_ops;
struct attribute **default_attrs;
};
struct sysfs_ops {
ssize_t (*show)(struct kobject *kobj, struct attribute *attr, char *);
ssize_t (*store)(struct kobject *kobj, struct attribute *attr, const char *, size_t);
};
The release
function will be called by the kernel when all
references to the object disappear. sysfs_ops
contains functions
handling attributes - e.g., in the case of struct device
, these are
simply functions casting kobject
to device
, attribute
to
device_attribute
, and calling show
/store
from the given
attribute. default_attrs
is a pointer to an array of pointers to
attributes (terminated by NULL
), which will be added to the objects
of the given type upon creation.
To create a new object, use the function:
int kobject_init_and_add(
struct kobject *kobj,
struct kobj_type *ktype,
struct kobject *parent,
const char *fmt, ...);
Before calling it, you should initialize kobj with zeros.
To remove it (or rather, get rid of your reference - the object will be deleted when all references disappear), use:
void kobject_put(struct kobject *kobj);
We can also duplicate our reference by:
struct kobject *kobject_get(struct kobject *kobj);
To add attributes to such a file after its creation:
int sysfs_create_file(struct kobject *kobj, const struct attribute *attr);
void sysfs_remove_file(struct kobject *kobj, const struct attribute *attr);
Relationships between Objects¶
Creating symbolic links in sysfs is possible through the functions:
int sysfs_create_link(struct kobject *kobj, struct kobject *target, char *name);
void sysfs_remove_link(struct kobject *kobj, char *name);
Useful Macros¶
Instead of manually creating *_attribute
structures, you can use
macros to simplify the writing process when creating attributes:
static DEVICE_ATTR(foo, S_IWUSR | S_IRUGO, show_foo, store_foo);
This is the same as manual declaration:
static struct device_attribute dev_attr_foo = {
.attr = {
.name = "foo",
.mode = S_IWUSR | S_IRUGO,
},
.show = show_foo,
.store = store_foo,
};
Going further down this path, you can do even more:
static DEVICE_ATTR_RW(foo_rw);
static DEVICE_ATTR_RO(foo_ro);
static DEVICE_ATTR_WO(foo_wo);
These macros assume that the functions are named accordingly,
foo_rw_show
, foo_rw_store
, foo_ro_show
, and foo_wo_store
.
For simple types, there are also versions with already implemented
_show
and _store
functions that operate on the indicated
variable:
static ulong var_ulong;
static int var_int;
static bool var_bool;
static DEVICE_ULONG_ATTR(foo_ulong, mode, var_ulong);
static DEVICE_INT_ATTR(foo_int, mode, var_int);
static DEVICE_BOOL_ATTR(foo_bool, mode, var_bool);
To pass the address of a variable, they use an additional structure
wrapping device_attribute
:
struct dev_ext_attribute {
struct device_attribute attr;
void *var;
};
Small Task #8¶
The example driver attached to the materials unfortunately does not allow changing the greeting, causing problems when using it in non-English-speaking countries.
Add the ability to change the greeting by the system administrator to the driver:
add a parameter
bufsize
to the module, which is the maximum size of the greetingat module startup, allocate (using
kmalloc
) a buffer of the given size, fill it with the default greeting"Hello, world!\n"
, and use it instead of the current static bufferadd support for the
write
operation, which will write to this bufferdata should be written to the position currently indicated by the file descriptor
after writing, the final position in the file should be remembered and considered as the current size of the greeting (for the purpose of reading)
Moreover, add a sysfs file that will read-only display the current value of hello_repeats
.
Literature¶
Driver APIs general documentation, especially:
Driver Model, mainly "The Basic Device Structure" and "The Linux Kernel Device Model" for this class
A. Rubini, J. Corbet, G. Kroah-Hartman, Linux Device Drivers, 3rd edition, O'Reilly, 2005. (http://lwn.net/Kernel/LDD3/)
Books listed on the course website: http://students.mimuw.edu.pl/ZSO/