ARTICLE

Understanding ELF using readelf and objdump
Contributed by Mulyadi Santosa in Misc on 2006-06-16 00:00:00
Page 2 of 3

B. Examining Section Header Table(SHT).

Let's see what kind of sections that exist inside our program (output is shortened):

$ readelf -S test

There are 28 section headers, starting at offset 0x80c:

Section Headers:
[Nr] Name      Type      Addr     Off    Size   ES Flg Lk Inf Al
........
[ 4] .dynsym   DYNSYM    08048174 000174 000060 10   A  5   1  4
........
[11] .plt      PROGBITS  08048290 000290 000030 04  AX  0   0  4
[12] .text     PROGBITS  080482c0 0002c0 0001d0 00  AX  0   0  4
........
[20] .got      PROGBITS  080495d8 0005d8 000004 04  WA  0   0  4
[21] .got.plt  PROGBITS  080495dc 0005dc 000014 04  WA  0   0  4
........
[22] .data     PROGBITS  080495f0 0005f0 000010 00  WA  0   0  4
[23] .bss      NOBITS    08049600 000600 000008 00  WA  0   0  4
........
[26] .symtab   SYMTAB    00000000 000c6c 000480 10     27  2c  4
........

.text section is a place where the compiler put executables code. As the consequence, this section is marked as executable ("X" on Flg field). In this section, you will see the machine codes of our main() procedure:

$ objdump -d -j .text test

-d tells objdump to diassembly the machine code and -j tells objdump to focus on specific section only (in this case, .text section)

08048370 
: ....... 8048397: 83 ec 08 sub $0x8,%esp 804839a: ff 35 fc 95 04 08 pushl 0x80495fc 80483a0: 68 c1 84 04 08 push $0x80484c1 80483a5: e8 06 ff ff ff call 80482b0 80483aa: 83 c4 10 add $0x10,%esp 80483ad: 83 ec 08 sub $0x8,%esp 80483b0: ff 35 04 96 04 08 pushl 0x8049604 80483b6: 68 d3 84 04 08 push $0x80484d3 80483bb: e8 f0 fe ff ff call 80482b0 .......

.data section hold all the initialized variable inside the program which doesn't live inside the stack. "Initialized" here means it is given an initial value like we did on "global_data". How about "local_data"? No, local_data's value isn't in .data since it lives on process's stack.

Here is what objdump found about .data section:

$ objdump -d -j .data test
.....
080495fc :
 80495fc:       04 00 00 00           ....

.....

One thing that we can conclude so far is that objdump kindly does address-to-symbol transformation for us. Without looking into symbol table, we know that 0x08049424 is the address of global_data. There, we clearly see that it is initialized with 4. Please note that common executables installed by most Linux distribution has been striped out, thus there is no entry in its symbol table. It makes objdump difficult to interpret the addresses.

And what is .bss? BSS (Block Started by Symbol) is a section where all unitialized variables are mapped. You might think "everything surely has an initial value". True, in Linux case, all unitialized variables are set as zero, that's why .bss section is just bunch of zeroes. For character type variables, that means null character. Knowing this fact, we know that global_data_2 is assigned 0 on runtime:

$ objdump -d -j .bss test-lagi
Disassembly of section .bss:
.....
08049604 :
 8049604:       00 00 00 00               ....
.....

Previously, we mentioned a bit about symbol table. This table is useful to find the correlation between a symbol name (non external function, variable) and an address. Using -s, readelf will decode the symbol table for you:

$ readelf -s ./test
Symbol table '.dynsym' contains 6 entries:
   Num:    Value  Size Type    Bind   Vis      Ndx Name
.....
     2: 00000000    57 FUNC    GLOBAL DEFAULT  UND printf@GLIBC_2.0 (2)
.....

Symbol table '.symtab' contains 72 entries:
   Num:    Value  Size Type    Bind   Vis      Ndx Name
.....    
    49: 080495fc     4 OBJECT  GLOBAL DEFAULT   22 global_data
.....
    55: 08048370   109 FUNC    GLOBAL DEFAULT   12 main
.....
    59: 00000000    57 FUNC    GLOBAL DEFAULT  UND printf@@GLIBC_2.0
.....
    61: 08049604     4 OBJECT  GLOBAL DEFAULT   23 global_data_2
.....

"Value" denotes the address of the symbol. For example, if an instruction refers to this address (e.g: pushl 0x80495fc), that means it refers to global_data. Printf() is treated differently, since it is a symbol that refers to an external function. Remember that printf is defined in glibc, not inside our program. Later, I will explain how our program calls printf.

C. Examining Program Header Table(PHT).

Like I explained previously, segment is the way operating system "sees" our program. Thus, let's see how will our program be segmented:

$ readelf -l test
.....
There are 7 program headers, starting at offset 52

Program Headers:
     Type     Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
[00] PHDR     0x000034 0x08048034 0x08048034 0x000e0 0x000e0 R E 0x4
[01] INTERP   0x000114 0x08048114 0x08048114 0x00013 0x00013 R   0x1
[02] LOAD     0x000000 0x08048000 0x08048000 0x004fc 0x004fc R E 0x1000
[03] LOAD     0x0004fc 0x080494fc 0x080494fc 0x00104 0x0010c RW  0x1000
[04] DYNAMIC  0x000510 0x08049510 0x08049510 0x000c8 0x000c8 RW  0x4
[05] NOTE     0x000128 0x08048128 0x08048128 0x00020 0x00020 R   0x4
[06] STACK    0x000000 0x00000000 0x00000000 0x00000 0x00000 RW  0x4

Section to Segment mapping:
  Segment Sections...
   00     
   01     .interp 
   02     .interp .note.ABI-tag .hash .dynsym .dynstr .gnu.version 
	  .gnu.version_r .rel.dyn .rel.plt .init .plt .text .fini 
	  .rodata .eh_frame 
   03     .ctors .dtors .jcr .dynamic .got .got.plt .data .bss 
   04     .dynamic 
   05     .note.ABI-tag 
   06
Note:I add numbers on the left of each PHT entries to make the reader easier to study the section to segment mapping.

The mapping is quite straightforward. For example, inside segment number 02, there are 15 sections mapped. .text section is mapped in this segment. Its flags are R and E, which means it is Readable and Executable. If you see W in segment's flag, that means it is writable.

By looking on "VirtAddr" column, we can discover the virtual start address of each segment. Back to the segment number #2, the start address is 0x08048000. Later in this section, we will discover that this address isn't the real address of the segment on memory. You can ignore the PhysAddr, because in Linux always operate in protected mode (on Intel/AMD 32 bit and 64 bit) thus virtual address is the thing that matters.

Segment has many types, but let's focus on two types:

  • LOAD: The segment's content is loaded from the executable file. "Offset" denotes the offset of the file where the kernel should start reading the file's content. "FileSiz" tells us how many bytes must be read from the file.

    For example,segment #2 is actually the content of the file starting from offset 0 to 4fc (offset+filesiz). To speed up the execution, the file's content is read on demand, thus it is only read from the disk if it is referenced at runtime.

  • STACK : The segment is stack area. Interesting to see that all the fields except "Flg" and "Align" are given 0. Is it an error? No, it is valid. It is the kernel's job to decide where the stack segment starts from and how big it is. Remember that on Intel compatible processor, stack grows downward (address is decremented each time a value is pushed).

Courious to see the real layout of process segment? We can use /proc/<pid>/maps file to reveal it. <pid> is the PID of the process we want to observe. Before we move on, we have a small problem here. Our test program runs so fast that it ends before we can even dump the related /proc entry. I use gdb to solve this. You can use another trick such as inserting sleep() before it calls return().

In a console (or a terminal emulator such as xterm) do:

$ gdb test
(gdb) b main
Breakpoint 1 at 0x8048376
(gdb) r
Breakpoint 1, 0x08048376 in main ()

Hold right here, open another console and find out the PID of program "test". If you want the quick way, type:

$ cat /proc/`pgrep test`/maps

You will see an output like below (you might get different output):

[1]  0039d000-003b2000 r-xp 00000000 16:41 1080084  /lib/ld-2.3.3.so
[2]  003b2000-003b3000 r--p 00014000 16:41 1080084  /lib/ld-2.3.3.so
[3]  003b3000-003b4000 rw-p 00015000 16:41 1080084  /lib/ld-2.3.3.so
[4]  003b6000-004cb000 r-xp 00000000 16:41 1080085  /lib/tls/libc-2.3.3.so
[5]  004cb000-004cd000 r--p 00115000 16:41 1080085  /lib/tls/libc-2.3.3.so
[6]  004cd000-004cf000 rw-p 00117000 16:41 1080085  /lib/tls/libc-2.3.3.so
[7]  004cf000-004d1000 rw-p 004cf000 00:00 0
[8]  08048000-08049000 r-xp 00000000 16:06 66970    /tmp/test
[9]  08049000-0804a000 rw-p 00000000 16:06 66970    /tmp/test
[10] b7fec000-b7fed000 rw-p b7fec000 00:00 0
[11] bffeb000-c0000000 rw-p bffeb000 00:00 0
[12] ffffe000-fffff000 ---p 00000000 00:00 0

Note: I add number on each line as reference.

Back to gdb, type:

(gdb) q

So, in total, we see 12 segment (also known as Virtual Memory Area--VMA). Focus on the first and the last field. First field denotes VMA address range, while last field shows the backing file. Do you see the similarity between VMA #8 and segment #02 listed in PHT? The difference is, SHT said it is ended on 0x080484fc, but on VMA #8, we see that it ends on 0x08049000. Same thing happens between VMA #9 and segment #03; SHT said it starts at 0x080494fc, while the VMA starts at 0x0804900.

There are several facts we must observe:

  1. Even though the VMA started on different address, the related sections are still mapped on exact virtual address.

  2. The kernel allocate memory on per page basis and the page size is 4KB. Thus, every page address is actually a multiple of 4KB e.g: 0x1000, 0x2000 and so on. So, for the first page of VMA #9, the page's address is 0x0804900. Or technically speaking, the address of the segment is rounded down (aligned) to the nearest page boundary.

Last, which one is the stack? That is VMA #11. Usually, the kernel allocate several pages dynamically and map to the highest virtual address possible in user space to form stack area. Simply speaking, each process address space is divided into two part (this assume Intel compatible 32 bit processor): user space and kernel space. User space is in 0x00000000-0xc0000000 range, while kernel space starts on 0xc0000000 onwards.

So, it is clear that stack is assigned address range near the 0xc0000000 boundary. The end address is static, while the start address is changing according to how many values are stored on stack.



Article Index
Understanding ELF using readelf and objdump
Examining Section Header Table(SHT)
How a function is referenced?
 
Discussion(s)
adfaf
Written by adfa on 2007-05-09 11:14:29
afdaf
Discuss! Reply!

sw
Written by hi on 2007-06-27 10:08:36
sws
Discuss! Reply!

nice
Written by trakos on 2007-12-28 11:31:24
Nice work!
Discuss! Reply!

Khali The Great
Written by Khali The Great on 2008-05-09 06:02:24
Real Nice
Discuss! Reply!

Thanks!
Written by iron on 2008-06-12 20:39:23
This was a very informative article.
Discuss! Reply!

WOW!
Written by someone on 2008-06-18 11:37:24
Wow!
Discuss! Reply!

Two process with the same entry point a
Written by Sandeep Tuppad on 2008-09-12 01:11:59
If there are two executables(almost same with few diffrences) which have the same entry point address, then how will the execution environment run these two process? Because if there are two diffrent process then their virual address space should'nt overlap in linux. But I printed the main address of the two process with the same entry point(__start) and they were same. How is this possible? Kindly clarify.

Note: I am running the linux and the two processes on MIPS processor.
Discuss! Reply!

Written by Christian on 2008-09-15 22:49:35
Quote:

If there are two executables(almost same with few diffrences) which have the same entry point address, then how will the execution environment run these two process? Because if there are two diffrent process then their virual address space should'nt overlap in linux. But I printed the main address of the two process with the same entry point(__start) and they were same. How is this possible? Kindly clarify.

Note: I am running the linux and the two processes on MIPS processor.






Please correct me if wrong, but they will run on their *own&quot; virtual address space, both virtuals address spaces are independent from each other.


process 1 entry point 0x00000001----------&gt; address space for proc 1

process 2 entry point 0x00000001----------&gt; address space for proc 2

Discuss! Reply!