mkinitcpio optimization (Arch Linux)

introduction

This article describes the results of my efforts of reducing the execution time of /sbin/mkinitcpio shell script.

The script is used by Arch Linux to generate the initramfs image for an installed kernel. It is run every system installation or kernel upgrade. See more information about mkinitcpio.

test configuration

Processor: Intel(R) Core(TM)2 Duo CPU T5750 @ 2.00GHz
Memory: 2GB (1.5 GB free)
Kernel: 2.6.29-ARCH
mkinitcpio: 0.5.23-1

description

Every kernel upgrade executes

mkinitcpio -p kernel26

This generates 2 images: default and fallback. The simulation of generating default image is

mkinitcpio

and for the fallback image:

mkinitcpio -S autodetect

The execution time for image generation is 8 and 29 seconds, which sum to 37 seconds. This seems to be a little bit too long...

Note: all the measurements refer to the simulation (dry run), so the image is not written to disk. Writing the image takes extra 5.5 secs.

structure

mkinitcpio has a nice layered design:

Simple measurements show that the bottom layer has the greatest overhead. If all of the routines executed instantly, the total execution time would be definitely below 5 seconds. This is close to the image write time.

The main assumption is to modify the border layers, but not touch the middle. This is in order not to break the compatibility with all existing hooks.

optimizations

All of the work was done in two stages.

In the first stage the bottlenecks were identified and patched as possible. This gave 5 patches, with which the execution time is respectively 3 and 10 secs, which gives a total of 13 seconds. Some details:

The aim of the second stage was to run the generation in parallel. The chosen approach was to run hooks in parallel, while still sequentially generate the default and fallback image, as it scales better. On 2 CPUs the total performance improved to 9-10 seconds.

The correctness is obviously preserved.

Some detailed benchmarks for stage 1 are available here.

download

/sbin/mkinitcpio /lib/initcpio/functions /lib/initcpio/install/autodetect /lib/initcpio/worker time
default default default - 37 sec.
optimized optimized optimized - 13 sec.
parallel (optimized) parallel worker 9-10 sec.

The patches are not provided for simplicity. One can manually diff files.

All the available resources are gathered here.

issues

While the first stage is complete, there are still some issues with the concurrency.

feedback from others

Processor: Intel(R) Celeron(R) CPU 2.80GHz

default optimized parallel
default 13s 5s 5s
fallback 47s 16s 15s
preset 60s 21s 20s

Thanks to: gonzacz

Predictably, the performance of the parallel version is very similar to the optimized version on 1 cpu.

going further

The current bottleneck is the add_module function. If it is replaced with an empty function, the total execution time is below 4 sec. Resolving module dependencies is the most painful there.

With the current approach (using /bin/bash) I believe it is impossible to significantly reduce the execution cost.

Alternatively it is worth considering migration into a more powerful script language with efficient data structures support (like Perl), which should handle the task easily. Module dependencies could be stored in a map that performs searches much faster. The major drawback is the necessity to rewrite everything from scratch. In my opinion (based on benchmarks) the expected gain is worth it. This should be fast enough to eliminate the need to parallelize the build of initramfs.

author

Krzysztof Kostałkowicz

kk219459@students.mimuw.edu.pl

created: 10 May 2009

last modified: 10 Sep 2009