Class 3: ELF¶
Date: 13.03.2018
Contents
About the ELF format¶
ELF is a file format used in Linux (and many other systems) for programs,
shared libraries (.so
), intermediate build results (.o
), and memory
dump files (core
).
Although the basic features of the ELF format are always the same, there are many elements depending on the processor architecture and sometimes on the operating system. Here we will only deal with the ELF format on the x86 architecture in the Linux system.
Unfortunately, there is no full ELF specification. The “base” ELF format is described in the System V generic ABI (gABI) documentation, and the architecture dependent parts should be described in the appropriate processor-specific ABI (psABI). In practice, however, psABI for many architectures can be hard to come by, very incomplete, or not written at all. The matter is further complicated by ELF extensions: ELF, as an open and flexible format, allows operating systems to define their own types of sections, relocations, symbols, etc. Many of them are not described anywhere.
Probably the most complete single document describing ELF is Sun’s Linker and Libraries Guide. This is basically the documentation of the linker and dynamic libraries on the Solaris operating system, but it contains a full description of the ELF format for i386, x86_64, sparc, sparc64 architectures, as well as several extensions used also on Linux systems (versioning, TLS).
Another useful resource is the elf.h
header file from the glibc library
(/usr/include/elf.h
). It contains constants and ELF structures for
the architectures and operating systems supported by glibc. Header files
with support for even more architectures can be found in the libbfd
library, which is part of binutils.
In this file only a sketch of the ELF format will be presented – for detailed information, I refer to the Linker and Libraries Guide.
ELF - basic structure¶
The ELF files on the basic level are composed of 4 areas:
- ELF header (at the beginning of the file): contains information about the parameters of the file and the machine it is intended for, as well as information about the position of section and program headers
- section headers: each header describes the type and location of one section. A section is a contiguous block of memory with uniform attributes. Most sections simply describe a memory area that should be created when the program is started and initialized with data from the file, but there are many special types of sections with more complex semantics.
- program headers: each header describes the type and location of one segment. A segment is a contiguous block of memory with uniform purpose and attributes from the point of view of loading and running the program. If the file has both segments and sections, the segments have a one-to-many relationship with the sections (because there may be many sections that the linker must distinguish, but the loader does not).
- contents of sections / segments
Whether sections / program headers may or must be present depends on the type of ELF file. There are 4 types of ELF files:
ET_REL
(relocatable file)A compiled, but not yet linked file (
.o
). Usually created as the result of compiling a single source file. It is not possible to run it directly – these files are an intermediate stage of compilation and are combined by the linker (programld
) into executable files or dynamic libraries.There is also an (uncommonly used) ability to combine several
.o
files into one larger file usingld -r
.As intermediate files,
ET_REL
can contain undefined symbols and undefined references – they will be fixed up by further linking steps.In the
ET_REL
type, section headers are required and program headers are not used.ET_EXEC
(executable file)A compiled and fully linked program, usually created by linking
.o
files through a linker. Such a file is ready to be launched – all segments have a fixed address at which they will be available during the program’s operation. All references in the file are also fixed – the only exceptions are special types of references to shared libraries, limited to one segment. This ensures that almost all the memory content loaded from the executable file is identical in all processes executing the given program and allows for sharing the memory.In the
ET_EXEC
type, program headers are required. Section headers are not needed to run the program, but are used by debuggers and are usually included.ET_DYN
(shared object file)A compiled and linked dynamic library (
.so
). Very similar toET_EXEC
, but with the following differences:- although most of the content is already set (undefined references, like in
ET_EXEC
, are limited to external references in one segment), the address at which the library will be loaded is not fixed – the library can be loaded to any place in memory. - because the library code can not contain references to its own address, a special code style is used, which is called PIC (Position-Independent Code). Whenever an address of an object in the library is needed, PIC code must somehow determine its own position and calculate the address of the desired object from it. This code is usually bigger and slower than “regular” code.
- thanks to the above features, the program can load many dynamic libraries into its address space, and even load them during operation
It should be noted that although the
ET_DYN
type is usually used for libraries, there is nothing to prevent it from being used for the main program as well – this technique is called PIE (Position-Independent Executable) and is sometimes used because of the possibility of full randomization of the process address space.An example of an executable
ET_DYN
file is the libc library (/lib/libc.so.6
) – it prints its version information on startup. Also, the dynamic linker is implemented as an executableET_DYN
(to avoid address conflict with the program that loads).- although most of the content is already set (undefined references, like in
ET_CORE
(core file)- A process memory dump, created when a process is killed by certain signals. It contains the full state of the process at the time of death, allowing you to open it in the debugger and determine the cause of the problem.
Interestingly, Linux kernel modules (.ko
) are of the ET_REL
type, and are
directly loaded by the kernel – the benefits of ET_EXEC
and ET_DYN
(ie shared memory) do not apply in kernel mode and their disadvantages (fixed
position ET_EXEC
, PIC ineffectiveness) would be quite severe.
ELF header¶
The ELF header contains the following information:
- file format identifier (
"\x7fELF"
) - file format: little endian or big endian, 32-bit or 64-bit – determines the format of the remaining structures
- ELF format version (only 1.0 has been used so far)
- operating system identifier (often ignored)
- ELF file type (
ET_*
) - target architecture (
EM_386
,EM_X86_64
,EM_SPARC
, …) - location and size of section and program headers
- address of the program entry point (for
ET_EXEC
and executableET_DYN
)
Sections¶
The information contained in a section header is:
- section name (a section can have any name, but for standard sections it is customary to use names that begin with a period)
- section type
- section attributes
- size, location in the file, and section alignment
- for
ET_EXEC
andET_DYN
: the final address of the section in memory (relative to the base address in the case ofET_DYN
) - associated section IDs (for some types)
The section type determines most of its semantics. The more important types are:
SHT_PROGBITS
- normal section, content loaded from a file
SHT_NOBITS
- ordinary section, but the content is filled with zeros instead of being loaded from a file
SHT_SYMTAB
- symbol table – contains information about objects contained in the file and external objects to which this file has references
SHT_STRTAB
- table of strings – contains the names used by section headers and entries in the symbol table
SHT_REL
/SHT_RELA
- contains information about unknown references used in a given (affiliated) section
SHT_DYNAMIC
- contains information for the dynamic linker
The more important section attributes are:
SHF_WRITE
- the section is writable at runtime
SHF_EXECINSTR
- the section contains executable code
SHF_ALLOC
- the section will be loaded into memory at runtime (sections without this flag are used only by build and debugging tools)
The standard section names used for regular code in C are:
.text
- code section
.rodata
- read-only data section (
const int x = 3;
) .data
- data section (
int x = 3;
) .bss
- the zeroed data section (
int x = 0;
)
Segments¶
The information contained in a program header is:
- segment type
- segment attributes
- location of the segment in the file and its address in memory
- size of the segment in the file and size of the segment in memory
(if they are different, the remaining part is filled with zeros –
used for sections of the type
SHT_NOBITS
)
The more important types of segments are:
PT_LOAD
- “regular” segment: loads the area into memory
PT_DYNAMIC
- marks the area as containing information for the dynamic linker
PT_INTERP
- indicates the file name of the dynamic linker to be used
The only architecture-independent / system-independent attributes are the access rights (rwx).
During linking, PT_LOAD
segments are created by merging all sections
with the SHF_ALLOC
flag with compatible access rights. All other segments
that are used at runtime are contained within PT_LOAD
segments.
Symbols and references¶
One of the main tasks of the ELF format is storing information about objects
contained in the file and about references to external objects. By object,
we mean a function or a (global) variable. From the ELF point of view,
an object is simply an area within a section (ET_REL
) or the address space
of a program (ET_EXEC
, ET_DYN
).
Symbols are names assigned to objects. The symbol can be defined (assigned to an object in a given file) or undefined (it will be defined at the moment of linking with the file that defines it).
The symbols are stored in the symbol table. The information stored about a symbol is:
name
value: position in the section (
ET_REL
) or memory (ET_EXEC
,ET_DYN
)the containing section
size (the size of the variable or size of the function code); it can be zero if we’re only interested in the address
type:
STT_OBJECT
a global variable
STT_FUNC
a function
STT_SECTION
a special symbol representing the beginning of the section (used for internal references)
linking rules:
STB_LOCAL
local symbol (
static
in C) – will not participate in linkingSTB_GLOBAL
global symbol
STB_WEAK
weak global symbol (
__attribute__((weak))
in gcc) – a special variant of a global symbol that automatically “loses” to the usual global symbol with the same name when both are defined
visibility rules – used to bind symbols between modules (a module is an executable program or a dynamic library):
STV_DEFAULT
default rules – the symbol is visible and can be shadowed by a symbol with the same name from another module
STV_PROTECTED
the symbol is visible, but references to it from the within the containing module will not be shadowed
STV_HIDDEN
the symbol is not visible from outside the module – like
STB_LOCAL
, but at the module level, not the source file levelSTV_INTERNAL
like
STV_HIDDEN
, but when the symbol is a function, we also assume that it will never be called from outside the module (which would be possible by passing the pointer). It can be used to further optimize PIC code.
These rules can be set in gcc by the appropriate
__attribute__
.
Symbols can be used in the code by references (called relocations). Relocation is an information for the linker, that in a given place of the section, instead of the bytes set at the time of compilation, it should insert the address of a symbol (or some other value unknown at compile time). Relocations are stored in the relocation tables (one for each section that requires it). The information stored for each relocation is:
- index of the referenced symbol in the symbol table
- the relocation position in the section
- type of relocation
- addend: an additional component to the value – the exact interpretation depends
on the type of relocation, most often it is simply a number added to the relocated
value. It can be used, for example, when someone asks for the address
a.y
, when we have the definitionstruct {int x, y; } a;
There are two types of relocation tables: SHT_REL
and SHT_RELA
.
For SHT_RELA
, the addend is stored in the relocation table, whereas
for SHT_REL
, the addend is stored as the initial content of the relocated
space. SHT_REL
allows you to reduce the file size, but SHT_RELA
is
required for architectures with complex relocation types (e.g., two-part
relocations of 16 bits each). The i386 architecture always uses SHT_REL
,
and the x86_64 architecture always uses SHT_RELA
.
Relocation types are very dependent on architecture. Most types of relocations are used for dynamic linking. The basic types of relocation on i386 are:
R_386_32
A 32-bit field is relocated, the relocated value is the address of the symbol + addend. For example, the following code:
extern struct { int x; int y; } a; a.y = 13;
will look like this in assembler:
movl $13, a+4
which translates into machine code as follows:
c7 05 XX XX XX XX 0d 00 00 00
where
XX XX XX XX
should be replaced with the address ofa + 4
. The assembler will save this in the ELF file section as:c7 05 04 00 00 00 0d 00 00 00
And in the relocation table for this section, it will make a relocation of type
R_386_32
referencing the symbola
at position 2 within the section (assuming that this code is at the very beginning of the section).R_386_PC32
A 32-bit field is relocated, the relocated value is the symbol address - field address + addend. This type of relocation is used for jumps and calls instructions (I remind you that in x86 jump and call statements the destination is stored as the difference between the jump instruction end address and the destination address). The following code:
extern void f (void); v();
which, in assembler, is:
call f
will be saved in the machine code as:
e8 XX XX XX XX .
where
XX XX XX XX
is (address off
- address of period). In the ELF file section this will be saved as:e8 fc ff ff ff
And in the relocation table there will be a relocation of the
R_386_PC32
type referencing the f` symbol at position 1. Please note that the assembler has set the relocation addend to0xfffffffc
(i.e., -4) – this is a correction included becauseR_386_PC32
is defined as an offset from the beginning of the relocated field, and the jump instruction uses the offset from the end of the jump instruction, i.e. from the end of the relocated field.
The basic types of relocation on x86_64 are:
R_X86_64_64
- A 64-bit field is relocated, analogous to
R_386_32
. R_X86_64_32S
- Like
R_X86_64_64
, but a signed 32-bit field is relocated. If the full 64-bit value can not be represented by this field, a linking error occurs. On the x86_64 architecture, most immediate parameters for instructions can only contain 32-bit signed numbers – so long as the finished program fits into the lower 2GB of the address space, this type of relocation is used for most code references. If the program becomes too large, it must be compiled with the-mcmodel=large
option, which uses onlymov
instructions to load addresses, supporting the full 64-bit range and using relocation typeR_X86_64_64
. R_X86_64_PC32
- Analogous to
R_386_PC32
.
Function calling convention on x86 architecture on Linux¶
(This is not really part of the topic, but it probably will be useful.)
The i386 architecture basically has 7 general-purpose registers: %eax
,
%ecx
, %edx
, %ebx
, %ebp
, %esi
, %edi
. In addition,
the %esp
stack pointer and the %eflags
register register are also
available from user programs.
The x86_64 architecture expands all of these registers to 64-bits (%rax
,
%rcx
, %rdx
, %rbx
, %rbp
, %rsi
, %rdi
, %rsp
,
%rflags
), and adds 8 new general-purpose registers (%r8
- %r15
).
The standard calling conventions for i386 architecture are as follows:
- the stack grows down,
%esp
indicates the top of the stack, which is the smallest address currently in use by the program. Any address on the stack smaller than%esp
can be destroyed at any time (eg by calling a signal service function). - at the entry point to a function (ie immediately after executing the call
instruction),
%esp
= -4 (mod 16), and the word at the top of the stack (at%esp
) is the return address from the function - the function should return by removing the return address from the stack
(increasing
%esp
by 4) and jumping to it. This is usually done with theret
instruction. - contents of registers
%ebx
,%ebp
,%esi
,%edi
after returning from the function must be equal to their contents at the moment it was called – the function must either save and restore the value of these registers, or not use them at all - the contents of the registers
%eax
,%ecx
,%edx
,%eflags
can be changed by a function without any consequences - if the function uses parameters, they will be passed on the stack, starting
with
%esp+4
(ie immediately after the return address). The function is to leave them there – only the return address is removed from the stack - if the function returns a value, it is stored in
%eax
.
And for x86_64:
- the stack grows down,
%rsp
indicates the top of the stack. 128 bytes below the top of the stack constitute the so-called red zone, i.e. an area that can be used and will not be overwritten, despite being located below the stack (only the area below the red zone can be overwritten by a signal service). This area is useful in functions that do not call other functions (so-called leaf functions), because it avoids moving the stack pointer if the function does not need a lot of space. - at the entry point to a function,
%rsp
= 8 (mod 16), and the word at the top of the stack (at%rsp
) is the return address from the function. - the function should return by removing the return address from the stack
(increasing
%rsp
by 8) and jumping to it. This is usually done with theret
instruction. - the content of the stack below
%rsp
at the entrance to the function can be freely modified by it, and the stack above should not be modified. - the contents of the registers
%rbx
,%rbp
,%r12
-%r15
after returning from the function must be equal to their contents at the moment it was called - contents of registers
%rax
,%rcx
,%rdx
,%rsi
,%rdi
,%r8
-%r11
can be changed by the function without any consequences - parameters to the function are passed in the following registers (in order):
%rdi
,%rsi
,%rdx
,%rcx
,%r8
,%r9
. If the function takes more than 6 parameters, they are passed on the stack starting at%rsp+8
. The function is to leave them there – only the return address is removed from the stack. - if the function returns a value, it returns it in
%rax
.
The above list does not include passing parameters and returning values other than ints / pointers or stranger x86 registers. For more details, I refer to psABI-i386 and psABI-x86_64.
Dynamic libraries¶
Global Offset Table¶
As previously mentioned, ELF’s main design goal for ET_EXEC
and ET_DYN
was the ability to share code and data between processes. Because external
references (ie relocations) obviously require modification of the memory
content in relation to the “template” contained in the file, it was decided
to gather them into one place, limiting the number of pages of memory that
can not be shared.
This place is called GOT (Global Offset Table). There is one GOT for each
module (library or main program) that needs it. It is simply a large array
of external symbol addresses required by a given module. When we write
a dynamic library (and we use PIC), the compiler automatically generates
code that loads the appropriate address from the GOT every time it needs
the address of an external object. For ET_EXEC
files, several tricks
are used so that the compiler does not have to explicitly use the GOT, but
the GOT is still used in some form for external function calls.
GOT is automatically created by the linker when linking a program or a dynamic
library. A special relocation table is created in the .rel.dyn
section,
in which relocations that fill the GOT are stored (as well as all other
relocations required during the dynamic linking process). These relocations
are of the type R_<arch>_GLOB_DAT
, which (in the case of x86) works
identically to R_386_32
or R_X86_64_64
, but additionally identifies
the purpose of the relocation as a GOT slot fill.
PIC on i386¶
The position-independent code sequences (PIC) are often tricky and their degree of complexity depends on the architecture. The i386 architecture is quite average in this respect – relative jump instructions are available, but there are no other ways of addressing memory relative to the instruction pointer. The basic code sequences used on the i386 architecture are:
finding the GOT position:
call _l1 _l1: popl %ebx addl $_GLOBAL_OFFSET_TABLE_+(.-_l1), %ebx
In this sequence, the
call
instruction is used to store address of the label_l1
(i.e., the ‘return’ address) to the stack. This address is then removed from the stack, and the GOT address is obtained by adding the difference between the GOT address and the_l1
address.The dot in the
addl
statement (denoting the address of the current instruction) is caused by historical reasons –_GLOBAL_OFFSET_TABLE_
is a special symbol understood by the assembler as (GOT address - address of the current instruction). Using this symbol also emits a special relocationR_386_GOTPC
(works like `` R_386_PC32``, but uses the GOT address instead of the destination symbol).After the sequence has been executed, the GOT address is in the register
%ebx
. This is the standard register for the GOT address – according to the calling conventions, it must be set to the GOT address whenever a PLT call is made (see below).Finding the address of the local variable (
static int x;
) (having already determined the GOT address):leal x@gotoff(%ebx), %ecx
Since most functions need to find the GOT anyway, the fact that local variables have a fixed offset from the GOT is used – the address of the variable is simply found by adding this difference to the GOT address.
x@gotoff
is a special assembler syntax for this difference. This corresponds to the relocationR_386_GOTOFF
(value = symbol address + addend - GOT address).Finding the address of an external variable (
extern int x;
) (having already determined the GOT address):movl x@got(%ebx), %ecx
x@got
is a special syntax denoting (address ofx
address in GOT - the GOT address). This instruction simply loads the contents of the appropriate GOT slot.x@got
corresponds to the relocationR_386_GOT32
. Using this relocation automatically creates a slot in the GOT for the corresponding symbol.
PIC on x86_64¶
The x86_64 architecture always allows the use of memory addressing relative
to the instruction pointer. Thanks to this, you can avoid the trick code
sequence looking for the GOT address and directly address slots in the GOT
by offset from the instructions that use them. For example, finding the
address of an external variable (extern int x;
) looks like this:
movq x@GOTPCREL(%rip), %rax
This corresponds to the relocation R_X86_64_GOTPCREL
.
Moreover, in order to get to a local variable, you do not have to use
the GOT in any way – you just have to encode an offset between
the instruction and the given variable and use relative addressing.
It uses the relocation R_X86_64_PC32
, the same one that is used
by jump and call instructions.
PLT¶
As an optimization in relation to the above mechanisms, a special mechanism for calling external functions was created: PLT (Procedure Linkage Table), allowing lazy binding of functions by a dynamic linker.
PLT is a special table containing (on x86) code instead of data. Each external
function called through the PLT has an entry in the PLT. The entry for function
f
looks like this (i386):
f@plt:
jmp *f_GOT_PLT_OFF(%ebx)
f_unbound:
pushl $f_REL_OFF
jmp plt0
Or like this (x86_64):
f@plt:
jmpq *f_GOT_PLT(%rip)
f_unbound:
pushq $f_REL_OFF
jmp plt0
And plt0
is a single special entry that looks like this:
pushq _GLOBAL_OFFSET_TABLE_+8(%rip)
jmpq *_GLOBAL_OFFSET_TABLE_+16(%rip)
Calling a function in PIC code looks like this:
call f@plt
And, in the case of i386, it assumes that %ebx
contains the GOT address.
The mechanism works as follows:
f_GOT_PLT_OFF
is an offset in the GOT of a special slot for the given PLT entry- this slot works like a regular GOT slot, but uses
R_<arch>_JMP_SLOT
instead ofR_<arch>_GLOB_DAT
, and is initially set (via linker) to the offset of thef_unbound
label relative to the library base. What’s more, the relocationR_<arch>_JMP_SLOT
is placed in a special, separate relocation table.rel.plt
- the dynamic linker, seeing this type of relocation, initially fills this slot with
the address of the
f_unbound
label by adding the base address of the library, instead of looking for the symbolf
- When the program calls
f@plt
for the first time, thejmp
statement will be executed with the contents of the slot, leading to thef_unbound
label - the offset of the
R_<arch>_JMP_SLOT
relocation corresponding to this slot inside the.rel.plt
section is placed on the stack. plt0
code pushes the contents of a special GOT slot with offset 4 (or 8 on x86_64) onto the stack – this slot is previously filled by the dynamic linker and contains some kind of identifying handle for the given module- the control is transferred to a special function from another special GOT slot with offset 8 (or 16) – this slot is also already filled by the dynamic linker and contains the address of a special function that binds symbols at runtime
- the dynamic linker, using the two parameters on the stack, determines what
symbol it is, and where to enter its address, after which the GOT slot is
refilled with the correct address and control is passed to the function
f
- when the program calls
f@plt
next time, the slot will already be filled and the control will go straight to thef
function
ET_EXEC – special tricks for dynamic linking¶
To ensure that compilation of the main program (ET_EXEC
file) does not
require any knowledge of the GOT / PLT mechanisms in the compiler, two
additional tricks are used:
- if the program refers to an external function symbol, a PLT entry for this
function within the main program is automatically created, and the address
of this PLT entry becomes the “official” address of this function inside
the whole process (this is required to ensure the address of this function
is fixed while linking the program code, and
&f
returns the same value throughout the program) - if the program refers to an external variable symbol, the linker automatically
creates a copy of this variable in the main program data segment and emits
to
.rel.dyn
a special relocationR_<arch>_COPY
, which will copy the initial contents of this variable from the module that originally defined it. The created copy of the variable becomes the “official” location of this variable at runtime, and the original variable in the defining library is no longer used.
_DYNAMIC structure¶
The _DYNAMIC
structure is a table of key:value consisting of information
about the contents of the module for the dynamic linker. It contains mainly:
- address and size of the symbol table involved in dynamic binding
- address and size of the
.rel.dyn
and.rel.plt
tables - GOT address
- list of libraries required by this module
- a list of library search paths
The linker finds the _DYNAMIC
structure by looking for the PT_DYNAMIC
segment.
Program launch sequence and the dynamic linker¶
Running statically linked programs¶
In the case of programs connected statically, the entire program initialization process is performed by the kernel. The kernel reads the ELF header, program headers, and loads all segments into memory. Then it creates the initial state of the program:
- the main thread stack is allocated
- on the main thread stack the following things are placed:
- program arguments (
argc
,argv
) - environmental variables (
environ
) - auxiliary vector (
auxv
)
- program arguments (
- the instruction pointer is set to the beginning of the program (from
the ELF header). With the standard compilation process, this field is
set by the linker to the address of the
_start
symbol - the program starts running
Note that _start
is not a function – it does not use the standard
parameter passing convention, nor can it return. The standard implementation
of _start
passes the parameters to the main()
function, and then
executes exit()
with the value returned by main()
as a parameter.
Dynamic linker¶
Running a dynamically linked program is much more complicated – the kernel can
not do it in its entirety. Instead, a special program called a dynamic linker
is used. This program is also known as ld.so
(from the name of the file
in which it was originally located). On i386 in Linux, the dynamic linker
is in the file /lib/ld-linux.so.2
, and on x86_64 – /lib64/ld-linux-x86-64.so.2
.
The kernel recognizes dynamically linked programs by the presence of a PT_INTERP
segment,
which contains the name of the file containing the dynamic linker. When it finds such a segment,
instead of passing the control to the program after loading it, it additionally loads and runs
the indicated dynamic linker (which is a file of the ET_DYN
type).
The dynamic linker starts working by finding its own _DYNAMIC
section and completing
its own relocation. In the next phase, the linker looks through the auxiliary vector (auxv
)
provided by the kernel. This is a list of key:value pairs describing the state of the process
and its environment. It contains, for example, information about the location of the program
headers of the main executable file in memory. After locating the executable file, the linker
loads (recursively) its dependencies. Then, the linker fills all relocations from .rel.dyn
,
stuffs .rel.plt
relocations with stubs, and finally transfers control to the main program
(by executing the entry point indicated in its ELF header).
The dynamic linker remains in memory after the program has been loaded and it is possible
to continue using its functions to open additional libraries, search for symbols, etc. using
the dlopen
, dlsym
and other functions. These functions are available by linking with
the libdl
library.
Useful commands¶
Compilation of a source file in PIC mode:
gcc -c x.c -fPIC
Note: there are two different options to enable PIC: -fpic
and -fPIC
.
Their exact meaning depends on the architecture. If they differ, -fpic
uses shorter code sequences, but is limited to smaller programs (eg there is
a limit of 1021 GOT entries on SPARC). On x86 both versions generate the same code.
Compilation and linking of a dynamic library:
gcc x.c -o libx.so -Wl,-soname=libx.so -shared -fPIC
Dump code, section table, symbols, and other data about a binary file:
objdump -xtrds <file>
Dump information about ELF structures:
readelf -a <file>
Dump the symbol table:
nm <file>
Dump the dynamic symbol table:
nm -D <file>
List of libraries used by the program:
ldd <program>
Thread-Local Storage¶
TLS is a fairly complex extension of ELF and C language. This is a mechanism that adds a new class of variables to the language (in addition to local and global variables): thread-local variables. They are declared like this (outside the scope of a function):
_Thread_local int x;
Thread variables behave similarly to global variables, but each running thread has its own instance. The full implementation is quite complicated (due to the possibility of dynamic creation of both threads and modules defining thread-local variables), so we will limit ourselves here to the general outline.
The thread-local variables are defined at the compilation stage in the sections
.tdata
and .tbss
, with the additional flag SHF_TLS
. The variables
themselves have the STT_TLS
type. When linking, all such sections in
the module are collected in one place and a PT_TLS
segment describing
the place is created.
During execution, the thread-local variables can be stored in one of two places:
- main TLS block: contains all TLS segments belonging to the main program and libraries loaded with it
- additional TLS blocks: contain other TLS segments (ie those from libraries
loaded by
dlopen
)
The pointer to the list of additional blocks is stored in the main block for
a given thread. The pointer to the main block is stored in a processor register
(%gs
on i386, %fs
on x86_64). Additional blocks are allocated lazily.
There are 4 models of access to thread-local variables, used depending on the situation:
- global dynamic: the most general, loads the handle of the library containing the symbol
and offset of the variable in the TLS segment of this library from the GOT, calls
___tls_get_addr
to get the address of this segment (perhaps allocating it) - local dynamic: as global dynamic, but assumes that we are in the same module as the variable – in the case of access to several variables, the address of the TLS segment is computed only once
- initial executable: used in the general case in
ET_EXEC
programs and other situations that guarantee that the variable is in the main TLS block – loads the variable offset in the main TLS block from the GOT - local executable: used in
ET_EXEC
programs to access their own variables – simply stores the offset in the main TLS block into the program code
Debugging and exception handling: DWARF¶
DWARF¶
A format closely associated with ELF is the DWARF (Debugging With Attributed
Record Formats) debugger information format. It defines many special sections
(with names beginning with .debug
), which contain information useful for
debugging the program, e.g.
- information on line numbers (
.debug_loc
) - information about the stack frame format (
.debug_frame
) - information about the types and locations of variables (
.debug_info
) - information about the macros used in the program (
.debug_macro
)
This information is included by the compiler only when prompted (-g
).
Exception handling, unwind mechanism¶
One of the most complex mechanisms required for full C++ implementation is exception handling. This mechanism has three tasks:
- go through the list of all stack frames with exception handlers
- determine which one contains the correct service function
- restore the state of the registers from the appropriate frame and call the exception handling code
The stack unwinding mechanism was created to solve the first and third tasks.
It uses the .eh_frame
section, which is very similar (but not identical)
to .debug_frame
, to go through the entire stack and recover the processor
state information at the time of each call.
Literature¶
- Linker and Libraries Guide, 2004, Sun Microsystems (chapters 7 and 8) - http://docs.oracle.com/cd/E19683-01/817-3677/817-3677.pdf
- gabi: http://www.uclibc.org/docs/SysV-ABI.pdf
- psABI-i386a: http://www.uclibc.org/docs/psABI-i386.pdf
- man dlsym, dlopen
- ELF handling for thread local storage: www.akkadia.org/drepper/tls.pdf
- The DWARF debugging standard: http://www.dwarfstd.org/