Buffer Overflow Series : Understanding the memory layout


Buffer overflow is the well known vulnerability in the info sec community. Back in early 2000  buffer overflow vulnerability was behind various zero day attacks like Code Red worm compromising IIS 5.0 and SQL Slammer exploiting MSSQL Server 2000. It is one of the very serious security issue which can be used to escalate privilege and take full control of the computing resource.


Buffer Overflow primarily exists in the codes written in the programming languages like c and c++. In such languages, it is the responsibility of the programmer to perform proper input validation and bound checks of user submitted data. In information security, the rule of thumb is to treat every inputs from the user as suspicious. It is the another classic case where a malformed user's input takes control over the vulnerable system and perform unauthorized actions. In this series of blog posts, we will be looking into what is buffer overflow attacks and various exploitation techniques.


Note : Buffer over flow vulnerability is know since early 80s and is not a new security exploit. Various measures have been taken since then to mitigate such attacks. In the modern day machines it will be very difficult if not impossible to perform such exploits because various modifications have been done at the hardware, OS and compiler level to thwart buffer overflow attacks. For learning and understanding purpose, a machine with all the security features disabled is required. One such VM can be downloaded from here. We will be using  the challenges from this site to learn the concepts and techniques behind buffer overflow attacks.


Before diving deep into the intricate details, it is very important to have the clear understand of programs during execution and analyzing the memory where the programs are mapped. In this blog we will look into the topics like 
  • Common CPU registers 
  • Understanding Assembly Language Code
  • Understanding function calls
  • Analyzing memory contents

Common CPU registers

  • ESP : Points to the top of the stack.
  • EBP : Points to the base of the current stack frame.
  • EIP : Points to the next instruction to be executed.
  • EAX, EBX, ECX, EDX : General purpose register (e.g for performing arithmetic operations)
One of the basic requirement for reverse engineering is to understand the assembly language code of a compiled executable. Buffer overflow attack requires the understanding of assembly language code as well as the program execution flow to take control of the program execution. We will be using a very simple C program (simple-program.c) to get the understanding of how function calls are made at assembly level and analyze the content CPU registers and memory  to prepare the foundation for buffer overflow attacks. Source code can be downloaded from my github repository. Compile the source code on the VM suggested in starting of this blog. We will be using GNU Debugger to debug the binaries. Please check my blog on gdb to learn the frequently used gdb commands along with its description.

Looking under the hood

When a compiled C program is executed, the whole executable along with the depending libraries are loaded onto the main memory(RAM). The mapped memory space can be viewed using gdb command info proc mapping
Fig 1. Mapped address space
The interesting area to be noted here are : 
  • Memory location where assembly language code of executable are loaded (/opt/protostar/bin/stack0)
  • Memory location where libc libraries are loaded (libc-2.11.2.so). Later in the series it will be used to launch ROP gadgets attack.
  • Memory location of stack
Stack is the area where we will be focusing on in this blog. It is the location in the main memory which is used by the functions to store local variables, return addresses during the function calls and information related to various arithmetic and logical operations. When the function completes the execution, space allocated to it along with the local variables are popped out of the stack. For current process the stack is mapped from the virtual address 0xbffeb000 to 0xc0000000

Use the command disassemble main and disassemble sum to get AT&T syntax of the two functions. 

Fig 2. AT&T syntax of main

Fig 3. AT&T syntax of sum
Whenever there is a function call, following steps are performed : 

  • All the arguments must be present on the stack before calling any function. The arguments are pushed from right to left. Here during the function call sum(20,30), argument 30 is pushed first and then 20.
  • Store the address of the next instruction onto the stack. From Fig 2, at the location 0x8048403 call to sum(int,int) is made. The next address to the call instruction i.e. 0x8048408 will get pushed onto the stack.
  • From Fig 3, the instruction push   %ebp will store the old EBP on the stack. From here stack frame of new function starts. This is the first instruction inside sum(int,int) function.
  • mov    %esp,%ebp will make the stack top as new base pointer. Now both EBP and ESP points to the same memory address. Run the command x/24wx $esp to get the content of the stack.

Fig 4. Content of the stack before the start of new function

  • sub    $0x10,%esp will create the space for local variable. Stack pointer moves up the stack (towards lower memory address)
  • mov    0xc(%ebp),%eax will move second argument into register EAX.
  • mov    0x8(%ebp),%edx will move first argument into register EDX.
  • lea    (%edx,%eax,1),%eax will perform addition (eax = edx + eax * 1)
  • mov    %eax,-0x4(%ebp) will store the value of EAX register (result) on the stack. Here it is just below base pointer EBP. Below is the current stack layout

Fig 5. Stack content before the exit of the function

  • leave instruction will will destroy the stack frame created by current function (in Fig 6 area marked by yellow are stack frame for function 'sum'). Value of old EBP stored on the stack frame of sum() will be popped out and gets stored in EBP register. ESP will start pointing to RETURN in stack frame for main(). Now we are in the old stack frame from where the function sum() was called.

Fig 6. Stack frame for function 'sum()'

  • ret instruction will pot the  RETURN address from the stack and store it in the EIP register. Now EIP will points to the instruction next to sum function call 0x08048408

The location of return address and its offset from the current stack pointer (ESP) is very important during exploit creation. This address is used to alter program execution flow and perform arbitrary code execution. Before performing a buffer overflow attack, it is very important to have the clear understanding of above steps and create a mental map of stack along with the contents which gets pushed onto the stack as program execution progresses. In the next blog we will look into launching our first buffer overflow attack.

I hope this blog was informative. Feel free to provide your comments. Also share if you find it useful.  Follow me on twitter @PiyushSaurabh and LinkedIn to get an alert of my new posts.

Happy Learning :)

Comments