Software Testing Blog

Code defects series: The science behind code execution with buffer overflows

Reading through the release notes of most software vendors, you will notice the security patches typically contain fixes to a variety of buffer overflow defects. For example, this is Apple’s Mac OS X update from December.  In this post on understanding defects, we will look at what goes on in a simple buffer overflow and how that allows one to change the flow or execution of a program. For the discussion that follows, assume we are executing on an Intel x86 CPU and the operating system is Linux.  First, let’s understand how a program in memory is organized. Broadly, there are three sections – text, data and stack (Figure 1).

Figure 1 – Organization of a program memory

Text is the read-only section that includes code and instructions on executing the program, and the Data region stores things like the static variables defined.  The Stack is the region that is most relevant to us. It starts at a fixed address and a register (SP) points to the top of the stack. Elements called ‘stack frames’ are PUSH’ed when calling a function, and POP’ed when returning. Among other things, a stack frame contains the value of the instruction pointer when the function is called. This instruction pointer is the key to altering the flow of execution from a buffer overflow.  Let’s first see what the stack looks like with a simple example:

void password (char *buf) 
char var[16];  
strcpy(var, buf); 
void main () 
printf(“This should be executed first\n”);  
printf(“This should be executed next\n”); 

When this program is executed, you will see the following:

This should be executed first
This should be executed next

We compile this code with gcc using the –S option to generate assembly code as output. The following are the relevant parts the instruction code.  In main(), we see:

call    password <– This will push the instruction pointer (IP) on to the stack so that it can be used as a return address (RET)

And in password(), we see this:

pushl   %ebp <–This pushes the frame pointer (EBP) on to the stack

movl    %esp, %ebp <–This copies the stack pointer (SP) onto EBP making it the new frame pointer (SFP)

You will notice some other manipulations in the assembly code, but for our discussion these are the most relevant instruction. When the function is called, this is what the stack looks like:

Notice that just before the buffer var[16] on the stack is SFP, and before it the return address. So if we could modify the var[16], fill it with a value that is larger than the 16 characters allocated so that we change the RET value, we can execute address we fill RET with. password() calls strcpy() which allows us to overflow var[16] and in turn change RET.  Using GDB, I know the address of the second printf() instruction in main().

(gdb) disassemble main
Dump of assembler code for function main:
0x0804840e &#60;main+0&#62;:    lea    0x4(%esp),%ecx
0x08048412 &#60;main+4&#62;:    and    $0xfffffff0,%esp
0x08048415 &#60;main+7&#62;:    pushl  -0x4(%ecx)
0x08048418 &#60;main+10&#62;:   push   %ebp
0x08048419 &#60;main+11&#62;:   mov    %esp,%ebp
0x0804841b &#60;main+13&#62;:   push   %ecx
0x0804841c &#60;main+14&#62;:   sub    $0x4,%esp
0x0804841f &#60;main+17&#62;:   movl   $0x8048514,(%esp)
0x08048426 &#60;main+24&#62;:   call   0x80483f4 &#60;password&#62;
0x0804842b &#60;main+29&#62;:   movl   $0x804851f,(%esp)
0x08048432 &#60;main+36&#62;:   call   0x8048324 &#60;puts@plt&#62; &#60;- Call to first printf() 
0x08048437 &#60;main+41&#62;:   movl   $0x804853d,(%esp)
0x0804843e &#60;main+48&#62;:   call   0x8048324 &#60;puts@plt&#62; &#60;- Call to second printf() 
0x08048443 &#60;main+53&#62;:   add    $0x4,%esp
0x08048446 &#60;main+56&#62;:   pop    %ecx
0x08048447 &#60;main+57&#62;:   pop    %ebp
0x08048448 &#60;main+58&#62;:   lea    -0x4(%ecx),%esp
0x0804844b &#60;main+61&#62;:   ret
End of assembler dump.

So now it’s a matter of passing a string larger than 16 characters that contains the address 0x0804843e, overflowing the buffer and changing the flow of the program execution.

An altered code execution flow is just one of the problems due to buffer overflows. And knowing the right address and being able to identify the buffer overflow to exploit is not a trivial challenge. But buffer overflows are still bad as at the very least they cause the program execution to halt if the RET address is corrupted. Even when it is not corrupted, you will have unexpected values if the source buffer is larger than the destination buffer.

If you have an interesting sources you refer to when understanding programming issues such as these, let us know about those by posting a comment here. My personal favorite on understanding buffer overflows is this paper titled ‘Smashing the Stack for Fun and Profit’.

Leave a Reply

Your email address will not be published. Required fields are marked *