Task 2: The HardDoom™ Device driver

Announced at: 10.04.2018

Due at: 15.05.2018 (final 29.05.2018)

Extra materials

Introduction

The task is to write a driver for the HardDoom™ device, which is a graphics accelerator designed for Doom. The device is delivered in the form of a modified version of qemu.

The device should be available to the user in the form of a character device. For each HardDoom™ device present in the system, create a /dev/doomX character device, where X is the index of the HardDoom™ device, starting with 0.

Character device interface

The device /dev/doom* is used only to create HardDoom™ resources – all the proper operations will be performed on the created resources. It should support the following operations:

  • open: obviously.
  • close: obviously.
  • ioctl(DOOMDEV_IOCTL_CREATE_SURFACE): creates a new frame buffer on the device. As a parameter of this call, the dimensions of the buffer (width and height) are transmitted. The width must be a multiple of 64 in the range 64 … 2048 and the height must be in the range 1 … 2048. The result of this call is a new file descriptor referring to the created buffer. The buffer created has undefined content.
  • ioctl(DOOMDEV_IOCTL_CREATE_TEXTURE): creates a new column texture on the device. Parameters of this call are texture size in bytes (maximum 4MiB), texture height in texels (maximum 1023, or 0, if the texture is not to be repeated vertically), and an pointer to the texture data. The result is a file descriptor referring to the texture created.
  • ioctl(DOOMDEV_IOCTL_CREATE_FLAT): creates a new flat texture on the device. The parameter for this call is the data pointer (0x1000 bytes). The result is a file descriptor referring to the texture created.
  • ioctl(DOOMDEV_IOCTL_CREATE_COLORMAPS): creates a new array of color maps on the device. The parameters of this call are the size of the array (number of color maps) and the pointer to data (each color map is 0x100 bytes). The result is a file descriptor referring to the created array. The maximum allowable size for the array is 0x100 maps.

Textures and color map arrays do not support any standard operations except close (which, if all other references have already been released, releases their memory) – they can only be used as parameters for drawing calls. It is also impossible to change their content in any way after creation.

All pointers are passed as uint64_t so that the structures have the same layout in 64-bit mode as in 32-bit mode, avoiding the need to define corresponding _compat structures. For the same reason, many structures have unused _pad fields.

The following operations can be called on the frame buffers:

  • ioctl (DOOMDEV_SURF_IOCTL_COPY_RECT): performs a series of COPY_RECT operations to a given buffer. Parameters are:

    • surf_src_fd: file descriptor pointing to the frame buffer from which the copy should be made.
    • rects_ptr: a pointer to an array of doomdev_copy_rect structures.
    • rects_num: number of rectangles to copy.
    • in the doomdev_copy_rect structures:
      • pos_dst_x, pos_dst_y – coordinates of the target rectangle in the given buffer (top left corner).
      • pos_src_x, pos_src_y – coordinates of the source rectangle in the source buffer.
      • width, height – size of the rectangle to be copied.

    The semantics of this call are quite similar to write: the driver tries to perform as many operations as possible from the given list, stopping in case of error or signal arrival. If the first operation failed, the error code is returned. Otherwise, the number of completed operations is returned. The user code is responsible for retrying when incomplete.

    The user is responsible for ensuring that, within one ioctl call, no pixel is both written and read (ie, the command INTERLOCK between the rectangles will not be required). The driver does not have to detect such situations (but it can if it wants to) – sending commands to the device and obtaining an incorrect drawing result is acceptable in such a situation.

  • ioctl(DOOMDEV_SURF_IOCTL_FILL_RECT): performs a series of FILL_RECT operations. Parameters:

    • rects_ptr: a pointer to an array of doomdev_fill_rect structures.
    • rects_num: number of rectangles to fill.
    • in the doomdev_fill_rect structures:
      • pos_dst_x, pos_dst_y – coordinates of the target rectangle in the given buffer.
      • width, height – size of the rectangle to be filled.
      • color – the fill color.

    The returned value is as in DOOMDEV_SURF_IOCTL_COPY_RECT.

  • ioctl(DOOMDEV_SURF_IOCTL_DRAW_LINE): performs a series of DRAW_LINE operations. Parameters:

    • lines_ptr: a pointer to an array of doomdev_line structures.
    • lines_num: number of lines to draw.
    • in the doomdev_line structures:
      • pos_a_x, pos_a_y: coordinates of the first endpoint of the line.
      • pos_b_x, pos_b_y: coordinates of the second endpoint.
      • color – the color of the line to be drawn.

    The returned value is as in DOOMDEV_SURF_IOCTL_COPY_RECT.

  • ioctl(DOOMDEV_SURF_IOCTL_DRAW_BACKGROUND): performs the DRAW_BACKGROUND operation. Parameters:

    • flat_fd: a file descriptor pointing to a flat texture that will serve as the background.

    In case of a successful call, 0 is returned.

  • ioctl(DOOMDEV_SURF_IOCTL_DRAW_COLUMNS): performs a series of DRAW_COLUMN operations. Parameters:

    • draw_flags: a combination of the following flags:
      • DOOMDEV_DRAW_FLAGS_FUZZ – if set, the fuzz effect will be rendered – most parameters are ignored (including other flags).
      • DOOMDEV_DRAW_FLAGS_TRANSLATE – if set, the palette will be remapped according to the translation color map.
      • DOOMDEV_DRAW_FLAGS_COLORMAP – if set, colors will be dimmed according to the color map.
    • texture_fd: a descriptor of the column texture (ignored if the FUZZ flag is set).
    • translation_fd: a descriptor of the color map array used by the TRANSLATE flag (ignored, if the flag is not set).
    • colormap_fd: a descriptor of the color map array used by the COLORMAP and FUZZ flags. Ignored, if none of these flags is set.
    • translation_idx: index of the color map used by the TRANSLATE option. Used only, if the TRANSLATE flag is set.
    • columns_num: number of columns to draw.
    • columns_ptr: a pointer to an array of doomdev_column structures:
      • column_offset: starting offset of this column in the texture.
      • ustart: an unsigned fixed-point 16.16 number, must be in the range supported by the hardware. Ignored, if the FUZZ flag is used.
      • ustep: an unsigned fixed-point 16.16 number, must be in the range supported by the hardware. Ignored, if the FUZZ flag is used.
      • x: the x coordinate of the column.
      • y1, y2: the y coordinates of the top and bottom pixels of the column.
      • colormap_idx: index of the color map used by FUZZ and COLORMAP flags. Ignored, if neither of those is set.

    The returned value is as in DOOMDEV_SURF_IOCTL_COPY_RECT.

  • ioctl(DOOMDEV_SURF_IOCTL_DRAW_SPANS): performs a series of DRAW_SPAN operations. Parameters:

    • flat_fd: a flat texture descriptor.
    • translation_fd: like above.
    • colormap_fd: like above.
    • draw_flags: like above, but without the FUZZ flag.
    • translation_idx: like above.
    • spans_num: number of spans to draw.
    • spans_ptr a pointer to an array of doomdev_span structures:
      • ustart, vstart: like ustart above.
      • ustep, vstep: like ustep above.
      • x1, x2: the x coordinates of the leftmost and rightmost pixel of the span.
      • y: the y coordinate of the span.
      • colormap_idx: like above.

    The returned value is as in DOOMDEV_SURF_IOCTL_COPY_RECT.

  • lseek: sets the position in the buffer for subsequent read calls.

  • read, pread, readv, etc: waits for completion of all previously submitted drawing operations for the given buffer, and then reads the finished data from the buffer to the user space. In case of an attempt to read outside of buffer bounds, end-of-file should be returned.

The driver should detect commands with incorrect parameters (wrong file type passed as *_fd, coordinates extending beyond the frame buffer, etc.) and return the error EINVAL. If the user tries to create textures or frame buffers larger than those supported by the hardware, EOVERFLOW should be returned.

The driver should register its devices in sysfs so that udev automatically creates device files with appropriate names in /dev. The major and minor numbers for these devices are arbitrary (majors should be allocated dynamically).

A header file with the appropriate definitions can be found here: https://github.com/koriakin/prboom-plus/blob/doomdev/src/doomdev.h

The driver can assume a limit of 256 devices in the system.

Assumptions for interaction with hardware

It can be assumed that before the driver is loaded, the device has a state like a hardware reset. The device should also be left in this state when the driver is unloaded.

A fully-scored solution should work asynchronously – drawing ioctl operations should send commands to the device and return to the user space without waiting for completion (but if the command buffers are already full, it is acceptable to wait for free space to become available). Waiting for the end of the command should only be done when calling read which will actually need the drawing results.

Scoring

You can get up to 10 points for the task. The score is a sum of three parts:

  • full use of the device (from 0 to 2 points):
    • fully asynchronous operation (ioctl does not wait for completion of sent commands, starting to send commands by ioctl does not wait for the commands sent earlier to be finished, read does not require stopping the whole device): 1p
    • using the command fetch block: 1p
  • test result (from 0 to 8 points)
  • evaluation of the solution code (from 0 to -10 points)

Solution format

The driver should be implemented as a Linux kernel module in version 4.9.13. The module containing the driver should be named harddoom.ko. As the solution, you should deliver an archive containing:

  • module sources
  • Makefile and Kbuild files that can build the module
  • a brief description of the solution

QEMU

A modified version of qemu, available in the source version, is required to use the HardDoom™ device.

To compile a modified version of qemu:

  1. Clone the https://github.com/koriakin/qemu repository

  2. git checkout harddoom

  3. Ensure that dependencies are installed: ncurses, libsdl, curl, and in some distributions also ncurses-dev, libsdl-dev, curl-dev (package names may vary slightly depending on the distribution)

  4. Run ./configure with options as you like (see ./configure --help). The official binary was compiled with:

    --target-list=i386-softmmu,x86_64-softmmu --python=$(which python2)
    --audio-drv-list=alsa,pa
    
  5. Run make

  6. Install by executing make install, or run directly (the binary is x86_64-softmmu/qemu-system-x86_64).

To run the modified qemu with the HardDoom™ device, give it the -device harddoom option. Passing this option several times will cause emulation of several instances of the device.

To add a HardDoom™ device live to a working qemu:

  • go to monitor mode in qemu (Ctrl+Alt+2 in the qemu window)
  • enter device_add harddoom
  • go back to the regular screen by Ctrl-Alt-1
  • enter echo 1 >/sys/bus/pci/rescan for linux to notice

To simulate removing the device:

echo 1 > /sys/bus/pci/devices/0000:<idurządzenia>/remove

Tests

To test the driver we have prepared a modified version of prboom-plus, which is a modernized version of the Doom game engine. To start it:

  • install the following packages in the image:
    • libsdl2-dev
    • libsdl2-mixer-dev
    • libsdl2-image-dev
    • libsdl2-net-dev
    • xfce4 [or other graphic environment]
    • xserver-xorg
    • autoconf
  • download sources from the https://github.com/koriakin/prboom-plus repository
  • choose branch doomdev
  • compile sources (without installation, the program will not be able to find its data file, prboom-plus.wad):
    • ./bootstrap
    • ./configure --prefix=$HOME
    • make
    • make install
  • download the game data file. You can use any of the following files:
    • freedoom1.wad or freedoom2.wad from the Freedoom project (https://freedoom.github.io/) – a Doom clone available under a free license.
    • doom.wad or doom2.wad from the full version of the original game, if you bought one.
    • doom1.wad from the shareware version of the original game.
  • load our driver and make sure we have access to /dev/doom0
  • start X11, and in it the game: $HOME/bin/prboom-plus -iwad <gyno.wad>
  • in the Options -> General -> Video mode menu, select the “doomdev” option (the default “8bit” setting selects software rendering in a very similar mode to our device and can be used to compare the results).

In order for the sound to work in the game, you should pass the -soundhw hda option to qemu and turn on the appropriate driver when compiling the kernel (Device Drivers -> Sound card support -> HD-Audio).

Hints

For files for frame buffers, textures, etc., we recommend using the anon_inode_getfile function. Unfortunately, such files do not allow lseek, pread, etc. by default – to fix this, set the flags FMODE_LSEEK | FMODE_PREAD | FMODE_PWRITE in the f_mode field.

To get the file structure from a file descriptor, you can use fdget and fdput. To see if the structure passed to us is the right type, just compare its file_operations pointer with ours.

We recommend starting the implementation from the FILL_COLOR and DRAW_LINE operations (they only require a frame buffer and allow you to see the map). Then we recommend DRAW_COLUMN (you can omit flags and color maps in the beginning) – it is responsible for drawing most of the graphics in the game and you will not see much without it.

The device requires that the texture size be a multiple of 256 bytes. If you create a texture size that is not supported directly by the hardware, align the size up and pad the data from the user with zeros.

The size of the texture or frame buffer is rarely exactly a multiple of a page – we can use this by placing the page table in the unused portion of the last page. This will avoid a separate allocation for a (usually very small) page table.

It is not necessary to use the FENCE command and related registers for the solution in the partially-scored synchronous version – just PING_SYNC. The solution in the full asynchronous version will need to use FENCE (in conjunction with the FENCE_WAIT register or with the PING_SYNC command to wait in read).

It may happen that we will not be able to send a command due to the lack of space in the queue of commands (the one built into the device or our own in the memory pointed to by CMD_*_PTR). To effectively implement the waiting for free space, we recommend:

  • send with some minimum frequency (eg every 1/8 .. 1/2 the size of our command buffer or hardware queue) the command PING_ASYNC
  • set the PONG_ASYNC interrupt to disabled by default
  • if there is a lack of space in the queue:
    • reset the interrupt PONG_ASYNC in INTR
    • check if there is still no place in the queue (protection from race) – if it is, immediately return to send
    • enable the interrupt PONG_ASYNC in INTR_ENABLE
    • wait for an interrupt
    • disable the PONG_ASYNC interrupt in INTR_ENABLE

In order for the device to work efficiently, you should avoid unnecessary sending commands that enforce serialization (mainly between commands from a single ioctl call):

  • * _PT, *_ADDR: clears the cache and blocks the batching of adjacent columns by the microcode (in particular, if two adjacent columns have the same colormap_idx, do not send the COLORMAP_ADDR command between them).
  • SURF_DIMS, TEXTURE_DIMS, DRAW_PARAMS, FENCE, PING_SYNC: block the batching of columns (but they are basically free between batches).
  • INTERLOCK: blocks parallel processing of COPY_RECT commands (if you want to read from the buffer you last drew before sending the last INTERLOCK command, there is no need to send another – in particular, for a series of COPY_RECT calls between different frame buffers, do not send INTERLOCK between calls.)

You do not need to worry about redundant FILL_COLOR, XY_*, *START, *STEP, PING_ASYNC commands – it will not take more than a device to process them than for the driver to check redundancy.

If you want to temporarily (for testing) change something in the game (eg to comment out operations that are not supported yet), the code responsible for the device operation can be found in src/i_doomdev.c.

If you break the configuration of prboom-plus so that it stops running sufficiently to change the settings, you can find its configuration file in $HOME/.prboom-plus/prboom-plus.cfg. Deleting it will restore the default settings.

To see various the operations in action:

  • DRAW_COLUMN without any flags: used to draw interface graphics (menu, etc.).
  • DRAW_COLUMN with COLORMAP: used to draw all walls and most objects.
  • DRAW_COLUMN with TRANSLATE: used to draw the new HUD (available under the F5 button – perhaps after pressing several times). If the pallete swap works, some of the digits should not be red.
  • DRAW_COLUMN with FUZZ: enter the idbeholdi code to make yourself invisible (simply by pressing consecutive letters during the game).
  • DRAW_SPAN: used for floor / ceiling drawing.
  • DRAW_LINE and FILL_COLOR: used to draw the map (Tab button).
  • COPY_RECT: used to draw the transition effect between game states (starting a new game or completing the level).
  • DRAW_BACKGROUND: reduce the screen size from full (pressing the - button several times) – the screen border should be filled.