.. _z3-device: =============== Acceldev Device =============== The **Acceldev** device is attached to the computer via the PCI bus. You will find the necessary information in ``acceldev.h``. The device does not have memory of its own and uses the main memory of the computer with Direct Memory Access (DMA). To overcome memory fragmentation, it uses virtual addresses and page tables in its specific format. The device supports ``ACCELDEV_MAX_CONTEXTS`` (255) independent contexts, which do not share memory. These contexts should be tied to the driver contexts. The device is designed to allow user commands to run without additional kernel mode validation, while ensuring that users cannot access system memory or other contexts on the device. If user commands are invalid or trigger an error, the device marks the context as errored and raises an interrupt to notify the driver. Ensuring fair compute time across contexts is not guaranteed; one context may occupy the device, preventing others from running. Dealing with this issue is outside the scope of this assignment. The device is controlled using MMIO registers. It has only one BAR (BAR0), used for these registers, and uses a single PCI interrupt line. The MMIO area is 64 KiB in size, but only some of this range is used for registers. All documented registers are 32-bit, little-endian format, and should be accessed only through aligned 32-bit reads and writes. Buffers and Paging ================== Data and commands used by the device are stored in paged buffers. Each context can have ``ACCELDEV_NUM_BUFFERS`` (16) buffers bound. These are configured using ``ACCELDEV_CONTEXTS_CONFIGS`` with an array of ``acceldev_context_on_device_config`` structures and the ``ACCELDEV_DEVICE_CMD_TYPE_BIND_SLOT`` device command. Except for ``ACCELDEV_CONTEXTS_CONFIGS``, which uses 64-bit contiguous memory, all buffers and page tables use 40-bit physical addresses, 22-bit virtual addresses in the buffer, and pages of size ``ACCELDEV_PAGE_SIZE`` (4 KiB). The page tables are single-level. - The kernel passes a 64-bit physical address of the buffer's page table to the device. - Bits 12â21 of the virtual address select the page table entry, which contains the physical address of the page. - Bits 0â11 of the virtual address represent the offset within the page. Page tables are 4 KiB in size and contain 1024 entries, each being a 32-bit little-endian word. Each page table entry has the following format: - Bit 0: ``PRESENT`` â if set, the entry is valid. If not set, using the entry raises a ``MEM_ERROR``. - Bits 4â31: ``PA`` â bits 12â39 of the page's physical address. Bits 0â11 are always zero; pages must be aligned. Sending Commands ================ The device supports two types of commands: - Device commands, sent and validated by the driver. - User commands (also called context commands), sent by running a code buffer via the ``ACCELDEV_DEVICE_CMD_TYPE_RUN`` command. Device Commands --------------- Device commands consist of ``ACCELDEV_DEVICE_CMD_WORDS`` (5) 32-bit little-endian words. They are sent via the ``CMD_MANUAL_FEED`` registers: - ``BAR0 + 0x008c: CMD_MANUAL_FREE`` Read-only register. Shows how many full commands may be queued before a ``FEED_ERROR`` occurs. The queue holds ``CMDS_BUFFER_SIZE`` (255) commands. Assume the queue is empty after a device reset. - ``BAR0 + 0x008c â BAR0 + 0x009c: CMD_MANUAL_FEED`` Five write-only registers for writing command words. Writing the last (4th counting from 0) word at ``BAR0 + 0x009c`` submits the command. Submitting when the queue is full (``CMD_MANUAL_FREE == 0``) raises a ``FEED_ERROR`` interrupt. NOP Command ~~~~~~~~~~~ Does nothing. Can be used to fill the queue if you feel like it. - 0th word: header - Command type: ``0x0`` The other words are unused. To submit the command you only need to write the 0th and 4th words. FENCE Command ~~~~~~~~~~~~~ Signals that all commands submitted before it have been processed. - 0th word: header - Command type: ``0x3`` - 1st word: 32-bit value ``VAL`` Behavior: 1. Waits for completion of all previous commands. 2. Sets ``CMD_FENCE_LAST`` to ``VAL``. 3. If ``VAL == CMD_FENCE_WAIT``, triggers ``FENCE_WAIT`` interrupt. Registers: - ``BAR0 + 0x00a0: CMD_FENCE_LAST`` 32-bit read/write register. Set to ``VAL`` while processing FENCE. - ``BAR0 + 0x00a4: CMD_FENCE_WAIT`` 32-bit read/write register. Used to schedule ``FENCE_WAIT`` interrupt. RUN command ~~~~~~~~~~~ Schedules user commands on a context. - 0th word: header - Bits 0â3: command type ``0x1`` - Bits 4â31: context ID - 1stâ2nd words: lower/upper 32 bits of code buffer page table address - 3rd word: offset (in bytes) of the first command - 4th word: size (in bytes) of commands to process (``n_commands * ACCELDEV_USER_CMD_WORDS * sizeof(uint32_t)``) BIND_SLOT Command ~~~~~~~~~~~~~~~~~ Binds or unbinds a data buffer to a slot for a given context. Binding and unbiding buffers can also be realized using ``ACCELDEV_CONTEXTS_CONFIGS`` but that method is unsafe to execute while the device may use the bound buffers. Therefore, ``BIND_SLOT`` is preferred when the context is already running. - 0th word: header - Bits 0â3: command type ``0x2`` - Bits 4â31: context ID - 1st word: slot number - 2ndâ3rd words: lower/upper 32 bits of data buffer page table address Unbiding a buffer is done by replacing it with a new buffer or submitting 0 as the buffer page table address. User Commands ------------- User commands are ``ACCELDEV_USER_CMD_WORDS`` 32-bit aligned little-endian words in a DMA buffer. Always write the full number of words, even if the command is shorter. Supported commands: - ``NOP (0x0)`` - ``FENCE (0x1)`` - ``FILL (0x2)`` FENCE Command ~~~~~~~~~~~~~ 1. Waits for previous user commands to finish. 2. Increments ``fence_counter`` in the context config. 3. Triggers ``USER_FENCE_WAIT`` interrupt. - 0th word: header - Command type: ``0x1`` FILL Command ~~~~~~~~~~~~ Fills part of a buffer with a value. - 0th word: header - Command type: ``0x2`` - 1st word: 32-bit value - 2nd word: buffer slot - 3rd word: start offset (bytes) - 4th word: length (bytes) A real accelerator would support more interesting commands but as the goal here is system programming, those will suffice. If you are interested in this topic, refer to :ref:`z3-driver-onnx`. Control Registers ================= ``BAR0 + 0x0008: ENABLE`` Controls whether the device processes commands. If 0, then the commands are not processed. ``BAR0 + 0x000c â 0x0010: CONTEXTS_CONFIGS`` Attaches the contexts' configuration memory. The configuration is stored in contiguous DMA memory containing an array of ``acceldev_context_on_device_config`` configs for each of the ``ACCELDEV_CONTEXTS_CONFIGS`` contexts. - ``BAR0 + 0x000c``: lower 32 bits of address - ``BAR0 + 0x0010``: upper 32 bits of address Interrupts ========== The device uses six internal interrupts, all multiplexed into one PCI interrupt. - **FENCE_WAIT** â completion of a FENCE command - **FEED_ERROR** â command queue full - **CMD_ERROR** â invalid device command - **MEM_ERROR** â invalid memory access - **SLOT_ERROR** â request for inactive slot - **USER_FENCE_WAIT** â user FENCE command triggered An interrupt becomes *active* on event occurrence and *inactive* when cleared by writing 1 to its bit in the ``INTR`` register. Independently, each of the above interrupts can also be either enabled or disabled at any given time. The driver can set an enabled subset of interrupts by writing an appropriate mask to the INTR_ENABLE register. The device will signal an interrupt on its PCI interrupt line if and only if there is an interrupt that is both enabled and active. ``BAR0 + 0x0000: INTR`` Interrupt status register. Each bit corresponds to an interrupt. Reading returns 1 for active, 0 for inactive. Writing resets (sets to inactive) all interrupts for which 1s were written. ``BAR0 + 0x0004: INTR_ENABLE`` Interrupt enable register. Same bit layout as ``INTR``. A bit value of 1 enables the interrupt. On reset, this is set to 0. Upon machine reset, the register is set to 0, blocking the device from signaling a PCI interrupt until the driver is loaded. Starting the Device =================== To start the device, follow these steps: 1. Clear ``INTR`` by writing all 1s. 2. Enable required interrupts via ``INTR_ENABLE``. 3. Attach context configs in ``ACCELDEV_CONTEXTS_CONFIGS``. 4. Enable all device blocks via ``ENABLE``. 5. Optionally set ``CMD_FENCE_LAST`` and ``CMD_FENCE_WAIT``. To shut down, write 0 to both ``ENABLE`` and ``INTR_ENABLE``. If the device reports an error, reset it by repeating the start-up procedure.