Class 2: ELF: modularity¶
Date: 04.03.2025 Small Task #2
Extras¶
Same as for the previous labs
Scenario¶
Modularity in general¶
Virtually every programming language has a concept of modularity: allowing building abstractions and code reuse. This, in turn, facilitates creating libraries: a generic piece of code used by both applications and other libraries. Although these dependency relationships are usually described as a tree, it is usually actually a DAG: if there are two paths to a dependency, should it have a single "instance" or a set of distinct ones? If the module has a global state, usually the former is the expected behavior.
With library evolution (versioning), this inherently creates a resolution problem so-called dependency hell, when it is increasingly hard to provide a consistent set of libraries for a set of required applications. With the advent of containerization, when memory is at premium, duplication is a popular solution. However, this leads to even simple applications weighting hundreds of megabytes.
During this lab, we'll discuss how such challenges are reflected in the context of native code execution.
Memory layout and sharing¶
At a very high level, there two kinds of "modules" for native code: static (.o
) and dynamic (.so
).
Static modules are combined in an executable at compile (linking) time,
while dynamic modules are only referenced to be resolved at runtime.
In particular, statically linked code offers more opportunities for optimizations,
while shared libraries offer more portability and memory saving.
A shared library will have only copy present in the physical memory,
and virtual memory addressing makes it easy to reference it from multiple applications.
To visualize, take a look at this very simplified diagram showing virtual-to-physical memory mapping of two executables (left). Sharing physical memory (right) enabled us to save two "blocks":
--- config: darkMode: 'true' theme: base themeVariables: lineColor: '#aaa' primaryColor: '#fefefe' --- block-beta columns 3 block:1 columns 1 block:P1:1 columns 1 %%title process 1 A.o B.o C.so D.so end block:P2:1 columns 1 %%title process 2 A'.o DD["D.so"] CC["C.so"] F.so end end space:1 block:Memory:1 columns 1 %%title physical memory Ar["A.o"] Br["B.o"] Dr["D.so"] Ar2["A'.o"] Cr["C.so"] Fr["F.so"] f1["free"] f2["free"] end A.o --> Ar B.o --> Br C.so --> Cr D.so --> Dr A'.o --> Ar2 DD --> Dr CC --> Cr F.so --> Fr
Very simplified representation of virtual memory paging of (shared) objects.¶
A bit more detailed view¶
Hands-on
You can view a description of a process's virtual memory map by reading the
/proc/<PID>/maps
pseudo-file. (More detailed view is in /proc/<PID>/smaps
.)
You can find a detailed description of the format in the kernel docs,
but for now it sufficies to say that the columns represent, in order:
start and end addresses of the mapping,
permission for memory pages (rwx),
offset to a file (if present in last column),
(device and inode),
and the mapped file or a marker: "[heap]", "[stack]".
Read the pseudo-file for a process of a simple program. For instance, you may use this trick to read a mapping
of a just-spawned cat
process:
cat /proc/self/maps
Think about what you see there and try to cross-reference the offset with permissions to segments of the binary file
(look at the "Offset" column). You may see a list of sections and segments for cat
with:
readelf -lS /usr/bin/cat
Check the addresses of libc for various processes: are they the same?
Open the smaps
pseudo-file and search for libc.so
. Look for Rss
and Pss
rows:
- RSS
Is the amount of the mapping currently resident in RAM.
- PSS
Is this process' proportional share of this mapping (i.e., RSS divided by the number of other processes with it).
Try to estimate how much memory would be required if each process on your machine had its own copy of libc.
00400000-0044f000 r-xp 00000000 1f:04 4 /bin/busybox
0045e000-0045f000 r-xp 0004e000 1f:04 4 /bin/busybox
0045f000-00460000 rwxp 0004f000 1f:04 4 /bin/busybox
77e24000-77e46000 r-xp 00000000 1f:04 288 /lib/libgcc_s.so.1
77e46000-77e47000 r-xp 00012000 1f:04 288 /lib/libgcc_s.so.1
77e47000-77e48000 rwxp 00013000 1f:04 288 /lib/libgcc_s.so.1
77e48000-77ee4000 r-xp 00000000 1f:04 286 /lib/libc.so
77ef3000-77ef5000 rwxp 0009b000 1f:04 286 /lib/libc.so
77ef5000-77ef7000 rwxp 00000000 00:00 0
7f761000-7f782000 rw-p 00000000 00:00 0 [stack]
7ffe0000-7ffe1000 r--p 00000000 00:00 0 [vvar]
7ffe1000-7ffe3000 r-xp 00000000 00:00 0 [vdso]
File types¶
Let's review ELF file types again:
ET_REL
(relocatable file)A compiled, but not yet linked file (
.o
). Usually created as the result of compiling a single source file. It is not possible to run it directly -- these files are an intermediate stage of compilation and are combined by the linker (programld
) into executable files or dynamic libraries.There is also an (uncommonly used) ability to combine several
.o
files into one larger file usingld -r
.As intermediate files,
ET_REL
can contain undefined symbols and undefined references -- they will be fixed up by further linking steps.In the
ET_REL
type, section headers are required and program headers are not used.ET_EXEC
(executable file)A compiled and fully linked program, usually created by linking
.o
files through a linker. Such a file is ready to be launched -- all segments have a fixed address at which they will be available during the program's operation. All references in the file are also fixed -- the only exceptions are special types of references to shared libraries, limited to one segment. This ensures that almost all the memory content loaded from the executable file is identical in all processes executing the given program and allows for sharing the memory.In the
ET_EXEC
type, program headers are required. Section headers are not needed to run the program, but are used by debuggers and are usually included.ET_DYN
(shared object file)A compiled and linked dynamic library (
.so
). Very similar toET_EXEC
, but with the following differences:although most of the content is already set (undefined references, like in
ET_EXEC
, are limited to external references in one segment), the address at which the library will be loaded is not fixed -- the library can be loaded to any place in memory.because the library code cannot contain references to its own address, a special code style is used, which is called PIC (Position-Independent Code). Whenever an address of an object in the library is needed, PIC code must somehow determine its own position and calculate the address of the desired object from it. This code is usually bigger and slower than "regular" code.
thanks to the above features, the program can load many dynamic libraries into its address space, and even load them during operation
It should be noted that although the
ET_DYN
type is usually used for libraries, there is nothing to prevent it from being used for the main program as well -- this technique is called PIE (Position-Independent Executable) and is sometimes used because of the possibility of full randomization of the process address space.An example of an executable
ET_DYN
file is the libc library (/lib/libc.so.6
) -- it prints its version information on startup. Also, the dynamic linker is implemented as an executableET_DYN
(to avoid address conflict with the program that loads).
Interestingly, Linux kernel modules (.ko
) are of the ET_REL
type, and are
directly loaded by the kernel -- the benefits of ET_EXEC
and ET_DYN
(i.e., shared memory) do not apply in kernel mode, and their disadvantages (fixed
position ET_EXEC
, PIC ineffectiveness) would be quite severe.
Sections¶
The information contained in a section header is:
section name (a section can have any name, but for standard sections it is customary to use names that begin with a period)
section type
section attributes
size, location in the file, and section alignment
for
ET_EXEC
andET_DYN
: the final address of the section in memory (relative to the base address in the case ofET_DYN
)associated section IDs (for some types)
The section type determines most of its semantics. The more important types are:
SHT_PROGBITS
normal section, content loaded from a file
SHT_NOBITS
ordinary section, but the content is filled with zeros instead of being loaded from a file
SHT_SYMTAB
symbol table -- contains information about objects contained in the file and external objects to which this file has references
SHT_STRTAB
table of strings -- contains the names used by section headers and entries in the symbol table
SHT_REL
/SHT_RELA
contains information about unknown references used in a given (affiliated) section
SHT_DYNAMIC
contains information for the dynamic linker
The more important section attributes are:
SHF_WRITE
the section is writable at runtime
SHF_EXECINSTR
the section contains executable code
SHF_ALLOC
the section will be loaded into memory at runtime (sections without this flag are used only by build and debugging tools)
Segments¶
The information contained in a program header is:
segment type
segment attributes
location of the segment in the file and its address in memory
size of the segment in the file and size of the segment in memory (if they are different, the remaining part is filled with zeros -- used for sections of the type
SHT_NOBITS
)
The more important types of segments are:
PT_LOAD
"regular" segment: loads the area into memory
PT_DYNAMIC
marks the area as containing information for the dynamic linker
PT_INTERP
indicates the file name of the dynamic linker to be used
The only architecture-independent / system-independent attributes are the access rights (rwx).
During linking, PT_LOAD
segments are created by merging all sections
with the SHF_ALLOC
flag with compatible access rights. All other segments
that are used at runtime are contained within PT_LOAD
segments.
Memory map, again¶
As we've seen, a shared library needs an extra per-process segment to enable resolving symbols. Below is a little less simplified virtual (left) to physical (right) memory mapping for three processes: two running the same binary A and one executing B, which use a shared library L.
--- config: darkMode: 'true' theme: base themeVariables: lineColor: '#aaa' primaryColor: '#fefefe' --- block-beta columns 4 block:As:2 columns 1 block:P1:1 columns 1 %%title process 1 A1["A.text"] A1d["A.data"] L1["L.so.text"] L1r["L.so.rel"] end block:P12:1 columns 1 %%title process 2 A2["A.text"] A2d["A.data"] L2["L.so.text"] L2r["L.so.rel"] end block:Pb:1 columns 1 %%title process 1 B1["B.text"] B1d["B.data"] Lb["L.so.text"] Lbr["L.so.rel"] end end space:1 block:Memory:1 columns 1 %%title physical memory Ar["A.text"] A1dr["A.data of A1"] A2dr["A.data of A2"] Lr["L.so.text"] f3["free"] L1dr["L.so.rel of A1"] L2dr["L.so.rel of A2"] f1["free"] Br["B.text"] B1dr["B.data of B"] Lbdr["L.so.rel of B"] end A1 --> Ar A2 --> Ar A1d --> A1dr A2d --> A2dr L1 --> Lr L2 --> Lr L1r --> L1dr L2r --> L2dr B1 --> Br B1d --> B1dr Lb --> Lr Lbr --> Lbdr
Somewhat less simplified representation of virtual memory and instances of the same executable.¶
Relocatable code and PIC¶
Tip
In the following tasks run the compiler with the following flags to simplify the generated code:
CFLAGS="-no-pie -march=haswell -fno-asynchronous-unwind-tables -fcf-protection=none -fno-stack-protector"
Hand-on
Create two C files which depend on each other like these two:
extern int YOUR_DATA;
int bar(int*);
static int DATA = 42;
static void baz() {
int a = 32 * 6;
}
int foo(int c) {
DATA = YOUR_DATA;
baz();
return bar(&DATA);
}
int foo(); // oopsie
int bar(int* arg) {
return *arg + 4;
}
int YOUR_DATA = 1337;
int main() {
return foo();
}
Compile these files without linking like so:
gcc -c part_a.c -o part_a.o $CFLAGS
gcc -c part_b.c -o part_b.o $CFLAGS
Observe them under objdump -dr
and readelf -a
.
You should see various kind of relocations which we will discuss next.
Link these files together by calling gcc with:
gcc part_a.o part_b.o -o parts $CFLAGS -Wl,-emit-relocs
Examine the result with objdump
: how were the holes resolved by the linker?
Try adding other flags like -fpic
, -fno-pic
, -fno-plt
, or -mcmodel=large
.
Compile and examine the files again.
Try invoking the linker directly with:
ld part_a.o part_b.o -o parts
What happened? Does the executable still work?
If there is a discrepancy, consider executing gcc in verbose mode (-v
) to see what flags it passes to the linker.
Examine the crt1.o
(or crt0.o
) file.
Tip
Use objdump -xtrds
to dump code, section table, symbols, and other data about a binary file
in a single command.
Symbols¶
One of the main tasks of the ELF format is storing information about objects
contained in the file and about references to external objects. By object,
we mean a function or a (global) variable. From the ELF point of view,
an object is simply an area within a section (ET_REL
) or the address space
of a program (ET_EXEC
, ET_DYN
).
Symbols are names assigned to objects. The symbol can be defined (assigned to an object in a given file) or undefined (it will be defined at the moment of linking with the file that defines it).
The symbols are stored in the symbol table. The information stored about a symbol is:
name
value: position in the section (
ET_REL
) or memory (ET_EXEC
,ET_DYN
)the containing section
size (the size of the variable or size of the function code); it can be zero if we're only interested in the address
type:
STT_OBJECT
a global variable
STT_FUNC
a function
STT_SECTION
a special symbol representing the beginning of the section (used for internal references)
linking rules:
STB_LOCAL
local symbol (
static
in C) -- will not participate in linkingSTB_GLOBAL
global symbol
STB_WEAK
weak global symbol (
__attribute__((weak))
in gcc) -- a special variant of a global symbol that automatically "loses" to the usual global symbol with the same name when both are defined
visibility rules -- used to bind symbols between modules (a module is an executable program or a dynamic library):
STV_DEFAULT
default rules -- the symbol is visible and can be shadowed by a symbol with the same name from another module
STV_PROTECTED
the symbol is visible, but references to it from within the containing module will not be shadowed
STV_HIDDEN
the symbol is not visible from outside the module -- like
STB_LOCAL
, but at the module level, not the source file levelSTV_INTERNAL
like
STV_HIDDEN
, but when the symbol is a function, we also assume that it will never be called from outside the module (which would be possible by passing the pointer). It can be used to further optimize PIC code.
These rules can be set in gcc by the appropriate
__attribute__
.
Relocations¶
Symbols can be used in the code by references (called relocations). Relocation is information for the linker, that in a given place of the section, instead of the bytes set at the time of compilation, it should insert the address of a symbol (or some other value unknown at compile time). Relocations are stored in the relocation tables (one for each section that requires it). The information stored for each relocation is:
index of the referenced symbol in the symbol table
the relocation position in the section
type of relocation
addend: an additional component to the value -- the exact interpretation depends on the type of relocation, most often it is simply a number added to the relocated value. It can be used, for example, when someone asks for the address
a.y
, when we have the definitionstruct {int x, y; } a;
There are two types of relocation tables: SHT_REL
and SHT_RELA
.
For SHT_RELA
, the addend is stored in the relocation table, whereas
for SHT_REL
, the addend is stored as the initial content of the relocated
space. SHT_REL
allows you to reduce the file size, but SHT_RELA
is
required for architectures with complex relocation types (e.g., two-part
relocations of 16 bits each). The i386 architecture always uses SHT_REL
,
and the x86_64 architecture always uses SHT_RELA
.
Relocation types are very dependent on architecture. Most types of relocations are used for dynamic linking. The basic types of relocation on i386 are:
R_386_32
A 32-bit field is relocated, the relocated value is the address of the symbol + addend. For example, the following code:
extern struct { int x; int y; } a; a.y = 13;
will look like this in assembly:
movl $13, a+4
which translates into machine code as follows:
c7 05 XX XX XX XX 0d 00 00 00
where
XX XX XX XX
should be replaced with the address ofa + 4
. The assembler will save this in the ELF file section as:c7 05 04 00 00 00 0d 00 00 00
And in the relocation table for this section, it will make a relocation of type
R_386_32
referencing the symbola
at position 2 within the section (assuming that this code is at the very beginning of the section).R_386_PC32
A 32-bit field is relocated, the relocated value is the symbol address - field address + addend. This type of relocation is used for jumps and calls instructions (I remind you that in x86 jump and call statements the destination is stored as the difference between the jump instruction end address and the destination address). The following code:
extern void f (void); f();
which, in assembly, is:
call f
will be saved in the machine code as:
e8 XX XX XX XX .
where
XX XX XX XX
is (address off
- address of the instruction.
). In the ELF file section, this will be saved as:e8 fc ff ff ff
And in the relocation table there will be a relocation of the
R_386_PC32
type referencing the f` symbol at position 1. Please note that the assembler has set the relocation addend to0xfffffffc
(i.e., -4) -- this is a correction included becauseR_386_PC32
is defined as an offset from the beginning of the relocated field, and the jump instruction uses the offset from the end of the jump instruction, i.e., from the end of the relocated field.
The basic types of relocation on x86_64 are:
R_X86_64_64
A 64-bit field is relocated, analogous to
R_386_32
.R_X86_64_32S
Like
R_X86_64_64
, but a signed 32-bit field is relocated. If the full 64-bit value cannot be represented by this field, a linking error occurs. On the x86_64 architecture, most immediate parameters for instructions can only contain 32-bit signed numbers -- so long as the finished program fits into the lower 2GB of the address space, this type of relocation is used for most code references. If the program becomes too large, it must be compiled with the-mcmodel=large
option, which uses onlymov
instructions to load addresses, supporting the full 64-bit range and using relocation typeR_X86_64_64
.R_X86_64_PC32
Analogous to
R_386_PC32
.
Large Task #1 - again¶
Head over to Assignment 1: Binary file converter to discuss relocations there.
Small Task #2¶
Write a small wrapper dynamic library that will hijack calls to printf
and print address of the format string
before the actual message.
The injection should happen by preloading the library with LD_PRELOAD
like so:
LD_PRELOAD=libtroll.so LD_LIBRARY_PATH=. ./parts
You can start by modifying the following code:
#include <stdio.h>
#include <stdarg.h>
void my_printf(const char* fmt, ...)
{
printf("[%p] ", fmt);
va_list args;
va_start(args, fmt);
vprintf(fmt, args);
va_end(args);
}
int main()
{
my_printf("printing %d + %d = %s\n", 2, 7, "no idea");
}
Make sure your code calls printf
again to show you can write a proper wrapper:
override the function and call the original.
If you want to see it in action with typical programs like gcc,
override the int __printf_chk(int flag, char* fmt, ...)
function as well.
Hint
man 3 dlsym printf strcat
man 8 ld.so
Extra Topics¶
If you want to extend your knowledge beyond the scope of this course, consider:
reading about Thread-Local Storage and DWARF,
reading the whole ELF Reference,
writing a similar wrapper to Small Task #2, but using static linking (
--wrap=symbol
flag told
),reading about startup tricks with crt and the implementation in a small libc (uClibc),
reading about linker scripts (
.lds
files).
Literature¶
Linker and Libraries Guide, 2004, Sun Microsystems (chapters 7 and 8) - http://docs.oracle.com/cd/E19683-01/817-3677/817-3677.pdf
psABI-i386a: http://www.uclibc.org/docs/psABI-i386.pdf
man dlsym, dlopen
ELF handling for the thread local storage: www.akkadia.org/drepper/tls.pdf
The DWARF debugging standard: http://www.dwarfstd.org/