Linux ELF binary .text section general layout study --------------------------------------------------- The following document is an applied examination of the general form of the code section of an ELF binary created on Linux by gcc. The purpose of this document is to provide an understanding of the executable data that is present in a compiled binary in both a generic sense, and some more specific data based on differing versions of the compiler system. I wrote the following out of a curioisty regarding all those strange functions you see when spending your life in a debugger, and although I've called it a 'document', it's more accurately described as a series of notes on the subject matter. As such, the emphasis is on data, not portraying an idea or concept. All systems are tested with the following scenario, an empty main function. /* START scenario.c */ int main(void) { return; } /* END scenario.c */ Compiled with: parabola:~# gcc -o scenario scenario.c Any comments should be directed to salvia@undernet (or slv if it's juped). System Alpha ------------ This is the base system by which all other data will be compared. The specifications are as follows: CPRU: Pentium 100 (i386) OPSY: Linux 2.4.27 GCCV: 2.95.4 20011002 (Debian prerelease) LIBC: 2.2.5 parabola:~# readelf -S scenario There are 27 section headers, starting at offset 0x794: Section Headers: [Nr] Name Type Addr Off Size ES Flg Lk Inf Al [ 0] NULL 00000000 000000 000000 00 0 0 0 [ 1] .interp PROGBITS 080480f4 0000f4 000013 00 A 0 0 1 [ 2] .note.ABI-tag NOTE 08048108 000108 000020 00 A 0 0 4 [ 3] .hash HASH 08048128 000128 00002c 04 A 4 0 4 [ 4] .dynsym DYNSYM 08048154 000154 000060 10 A 5 1 4 [ 5] .dynstr STRTAB 080481b4 0001b4 000073 00 A 0 0 1 [ 6] .gnu.version VERSYM 08048228 000228 00000c 02 A 4 0 2 [ 7] .gnu.version_r VERNEED 08048234 000234 000020 00 A 5 1 4 [ 8] .rel.dyn REL 08048254 000254 000008 08 A 4 0 4 [ 9] .rel.plt REL 0804825c 00025c 000018 08 A 4 b 4 [10] .init PROGBITS 08048274 000274 000025 00 AX 0 0 4 [11] .plt PROGBITS 0804829c 00029c 000040 04 AX 0 0 4 [12] .text PROGBITS 080482e0 0002e0 000120 00 AX 0 0 16 [13] .fini PROGBITS 08048400 000400 00001c 00 AX 0 0 4 [14] .rodata PROGBITS 0804841c 00041c 000008 00 A 0 0 4 [15] .data PROGBITS 08049424 000424 000010 00 WA 0 0 4 [16] .eh_frame PROGBITS 08049434 000434 000004 00 WA 0 0 4 [17] .dynamic DYNAMIC 08049438 000438 0000c8 08 WA 5 0 4 [18] .ctors PROGBITS 08049500 000500 000008 00 WA 0 0 4 [19] .dtors PROGBITS 08049508 000508 000008 00 WA 0 0 4 [20] .got PROGBITS 08049510 000510 00001c 04 WA 0 0 4 [21] .bss NOBITS 0804952c 00052c 000018 00 WA 0 0 4 [22] .comment PROGBITS 00000000 00052c 000120 00 0 0 1 [23] .note NOTE 00000000 00064c 000078 00 0 0 1 [24] .shstrtab STRTAB 00000000 0006c4 0000cf 00 0 0 1 [25] .symtab SYMTAB 00000000 000bcc 000480 10 26 37 4 [26] .strtab STRTAB 00000000 00104c 000206 00 0 0 1 We can see that the .text section is 0x120 bytes long. Consider that the single action of returning a function takes 0x2 bytes, leave and ret. Add to this the prolog and we get 0x5 bytes. However, the main function we get from the scenario executable is in fact 0xf bytes long due to a near jmp and padding after the return. Lets look at the position of the main function in the .text section: 0x80483c0
The main function resides a full 0xe0 bytes from the start of the .text section. Let is examine these 0xe0 bytes (and the few proceeding bytes as well) by starting at the memory address 0x080482e0, the start of the .text section. The function residing at 0x080482e0 is particularly interesting as it is the so called "entry point" of the binary. parabola:~# readelf -h scenario | grep Entry Entry point address: 0x80482e0 This means that upon execution of the binary from the shell, control gets passed to the function at 0x80482e0, meaning it is the first function to be called. Lets find out what this function is, and look at exactly what it does. (gdb) disas 0x080482e0 Dump of assembler code for function _start: 0x80482e0 <_start>: xor %ebp,%ebp 0x80482e2 <_start+2>: pop %esi 0x80482e3 <_start+3>: mov %esp,%ecx 0x80482e5 <_start+5>: and $0xfffffff0,%esp 0x80482e8 <_start+8>: push %eax 0x80482e9 <_start+9>: push %esp 0x80482ea <_start+10>: push %edx 0x80482eb <_start+11>: push $0x8048400 0x80482f0 <_start+16>: push $0x8048274 0x80482f5 <_start+21>: push %ecx 0x80482f6 <_start+22>: push %esi 0x80482f7 <_start+23>: push $0x80483c0 0x80482fc <_start+28>: call 0x80482cc <__libc_start_main> 0x8048301 <_start+33>: hlt 0x8048302 <_start+34>: mov %esi,%esi End of assembler dump. We can see from this that _start is in fact a wrapper for __libc_start_main which is itself a library function contained predictably in the library "libc". The call in to 0x80482cc is deceptive, as it is actually a call in to scenario's PLT, or Procedure Linkage Table. The PLT is the glue that holds a library function to a binary. Whenever a library function is called, control passes to the PLT, and then to the library function. A description of how the PLT works is beyond the scope of this document, and may be found in the ELF TIS document. When __libc_start_main is called we notice the following: the first argument is the address of main, the second argument (%esi) is argc, the third argument (%ecx) is the address of argv, the fourth argument is the address of _init, the fifth argument is the address of _fini, the sixth argument is a function pointer to be registered with atexit(), and the final argument (%esp) is the highest stack address available for future code. Notice that %esp has previously been 16-byte aligned by the "and" call. The __libc_start_main function is in charge of running the main function. Before the main function is run however, several initilizations and other checks are made. These include: initialization of the thread library, ensuring suid binaries recieve open standard file descriptors, initialization of any "auxiliary" vectors, registering the dynamic linker's deconstructor (_fini by atexit), and calling the general initialization function _init. The main function is run in the form exit(main()) i.e the main function is run, then the process is exited immediately after the fact. (Interestingly there is a section of eight NOPs immediately following the exit call, which in turn is followed by 127 bytes of further instructions. These instructions fall under the __libc_start_main symbol but do not appear to be part of the __libc_start_main function proper.) Let us now turn out attention to the _init or initialization function, the important portion of which is the following three instructions: 0x8048287 <_init+19>: call 0x8048304 0x804828c <_init+24>: call 0x8048388 0x8048291 <_init+29>: call 0x80483d0 <__do_global_ctors_aux> The function call_gmon_start initializes the gmon profiling system. This system is enabled when binaries are compiled with the -pg flag, and creates output for use with gprof(1). In the case of the scenario binary call_gmon_start is situated directly proceeding that _start function. The call_gmon_start function finds the last entry in the Global Offset Table (also known as __gmon_start__) and, if not NULL, will pass control to the specified address. The __gmon_start__ element points to the gmon initialization function, which starts the recording of profiling information and registers a cleanup function with atexit(). In our case however gmon is not in use, and as such __gmon_start__ is NULL. The frame_dummy function is a wrapper for __register_frame_info which is in turn a wrapper for __register_frame_info_bases. We will examine the purpose of this function shortly. It should be noted that __register_frame_info is called with two arguments: 0x804839a : push $0x804952c 0x804839f : push $0x8049434 0x80483a4 : call 0x80482ac <__register_frame_info> The first argument, 0x8049434, is the address of the ELF section .eh_frame, while the second argument is the address of the .bss which __register_frame_info treats as a "struct object". That is to to say, __register_frame_info is passed memory for an object structure on the uninitialized heap. If .eh_frame is NULL then __register_frame_info_bases returns, or otherwise initializes the object structure discussed above. The purpose of .eh_frame and the accompanying register and deregister functions is to provide debugging information (in the DWARF2 format) when explicity enabled (for instance, with -gdwarf-2). In our case this debugging is not enabled, and .eh_frame is NULL meaning that __register_frame_info_bases is unused. Finally we reach __do_global_ctors_aux. This function is in charge of dispatching any constructor entries found in the .ctors section. It achieves this by doing roughly the following: do { unsigned long nptrs = (unsigned long) __CTOR_LIST__[0]; unsigned i; if (nptrs == (unsigned long)-1) for (nptrs = 0; __CTOR_LIST__[nptrs + 1] != 0; nptrs++); for (i = nptrs; i >= 1; i--) __CTOR_LIST__[i] (); } while (0); Now that we have examined both the initialization and main execution of this scenario, we are left only with the task of examining _fini, the cleanup funtion registered with atexit() mentioned above. It should be mentioned that both _init and _fini do not reside in the .text section, but instead are situated sections called .init and .fini respectively. We can see from the readelf output above that the .fini section directly proceedes the .text section. In this scenario, _fini is only in charge of calling any deconstructors: 0x8048414 <_fini+20>: call 0x8048330 <__do_global_dtors_aux> The __do_global_dtors_aux function is largely comparable to __do_global_ctors_aux, where instead of running all the constructors found in .ctors it runs all the deconstructors found in .dtors. Let us now summarise the layout of the .text section in scenario: 0x80482e0 <_start>: entry point function, wrapper for __libc_start_main. 0x8048304 : initialize execution profiling 0x8048330 <__do_global_dtors_aux>: dispatch any deconstructors found in .dtors 0x8048380 : the significance of this function is unclear. it consists of a prolog and epilog, but nothing more. 0x8048388 : wrapper for __register_frame_info, DWARF2 initialization. 0x80483ac : see fini_dummy. possibly used to align main. 0x80483c0
: main function as per source code 0x80483d0 <__do_global_ctors_aux>: dispatch any constructors found in .ctors 0x80483f4 see fini_dummy. possibly used to align .fini While _init and _fini reside in their own sections: 0x8048274 <_init>: initialize process for execution. 0x8048400 <_fini>: cleanup process at end of execution. System Beta ----------- CPRU: Pentium 100 (i386) OPSY: Linux 2.6.8.1 GCCV: 3.3.5 (Debian 1:3.3.5-2) LIBC: 2.3.2 parabola:~# readelf -S scenario There are 33 section headers, starting at offset 0x1e58: Section Headers: [Nr] Name Type Addr Off Size ES Flg Lk Inf Al [ 0] NULL 00000000 000000 000000 00 0 0 0 [ 1] .interp PROGBITS 08048114 000114 000013 00 A 0 0 1 [ 2] .note.ABI-tag NOTE 08048128 000128 000020 00 A 0 0 4 [ 3] .hash HASH 08048148 000148 000028 04 A 4 0 4 [ 4] .dynsym DYNSYM 08048170 000170 000050 10 A 5 1 4 [ 5] .dynstr STRTAB 080481c0 0001c0 000059 00 A 0 0 1 [ 6] .gnu.version VERSYM 0804821a 00021a 00000a 02 A 4 0 2 [ 7] .gnu.version_r VERNEED 08048224 000224 000020 00 A 5 1 4 [ 8] .rel.dyn REL 08048244 000244 000008 08 A 4 0 4 [ 9] .rel.plt REL 0804824c 00024c 000008 08 A 4 b 4 [10] .init PROGBITS 08048254 000254 000017 00 AX 0 0 4 [11] .plt PROGBITS 0804826c 00026c 000020 04 AX 0 0 4 [12] .text PROGBITS 08048290 000290 0001d0 00 AX 0 0 16 [13] .fini PROGBITS 08048460 000460 00001b 00 AX 0 0 4 [14] .rodata PROGBITS 0804847c 00047c 000008 00 A 0 0 4 [15] .data PROGBITS 08049484 000484 00000c 00 WA 0 0 4 [16] .eh_frame PROGBITS 08049490 000490 000004 00 A 0 0 4 [17] .dynamic DYNAMIC 08049494 000494 0000c8 08 WA 5 0 4 [18] .ctors PROGBITS 0804955c 00055c 000008 00 WA 0 0 4 [19] .dtors PROGBITS 08049564 000564 000008 00 WA 0 0 4 [20] .jcr PROGBITS 0804956c 00056c 000004 00 WA 0 0 4 [21] .got PROGBITS 08049570 000570 000014 04 WA 0 0 4 [22] .bss NOBITS 08049584 000584 000004 00 WA 0 0 4 [23] .comment PROGBITS 00000000 000584 000103 00 0 0 1 [24] .debug_aranges PROGBITS 00000000 000688 000078 00 0 0 8 [25] .debug_pubnames PROGBITS 00000000 000700 000025 00 0 0 1 [26] .debug_info PROGBITS 00000000 000725 000b45 00 0 0 1 [27] .debug_abbrev PROGBITS 00000000 00126a 000138 00 0 0 1 [28] .debug_line PROGBITS 00000000 0013a2 0002b7 00 0 0 1 [29] .debug_str PROGBITS 00000000 001659 0006df 01 MS 0 0 1 [30] .shstrtab STRTAB 00000000 001d38 00011e 00 0 0 1 [31] .symtab SYMTAB 00000000 002380 0006a0 10 32 52 4 [32] .strtab STRTAB 00000000 002a20 000487 00 0 0 1 The first thing to note is that the .text section has gained an extra 0xb0 bytes. The main function has also leaped to a size of 0x1b. This is partly because of a curious portion of code between the prolog and the return: 0x0804835a : and $0xfffffff0,%esp 0x0804835d : mov $0x0,%eax 0x08048362 : sub %eax,%esp This has the effect of 16-byte aligning %esp. The rest of the size increase is due to a string of 10 nops proceeding the epilog. The entry point of the binary stil points towards _start: parabola:~# readelf -h scenario | grep Entry Entry point address: 0x8048290 0x8048290 <_start>: 0x31 The _start function is directly comparable to the alpha system's, it simply calls __libc_start_main. Although the code for __libc_start_main has changed, it still provides the same functionality. Similarly, _init calls the same three initialization functions: call_gmon_start, frame_dummy, __do_global_ctors_aux. The functionality of these three functions is also unchanged. The cleanup function _fini is still registered by __libc_start_main, and it still simply calls __do_global_dtors_aux, which is unchanged from the alpha system. The following is a layout of the .text section on this system: 0x8048290 <_start> 0x80482b4 0x80482e0 <__do_global_dtors_aux> 0x8048320 0x8048354
0x8048370 <__libc_csu_init> 0x80483d0 <__libc_csu_fini> 0x8048420 <__i686.get_pc_thunk.bx> 0x8048430 <__do_global_ctors_aux> The first thing to notice is the lack of fini_dummy and init_dummy symbols. We must also consider the addition of three extra symbols: __libc_csu_init, __libc_csu_fini and __i686.get_pc_thunk.bx. The 15 byte symbol __i686.get_pc_thunk.bx at first appears to be padding, but actually serves in PIC register loading (the .bx in fact represents the ebx register). This PIC register loading occurs in both the __libc_csu_init and __libc_csu_fini functions. These functions are wrappers for _init and _fini respectively. On a second examination of the _start procedure it can be seen that the addresses of these functions are passed to __libc_start_main instead of the actual .init and .fini sections. The intention in doing this is to allow the linker to pass "init_array" and "fini_array" function pointers to allow for init/fini hooks. The glibc developers call this "Startup support for ELF initializers/finalizers in the main executable". System Gamma ------------ CPRU: Pentium 100 (i386) OPSY: Linux 2.6.9 GCCV: 3.3.4 (Slackware) LIBC: 2.3.2 root@parabola:~# readelf -S scenario There are 33 section headers, starting at offset 0x1a10: Section Headers: [Nr] Name Type Addr Off Size ES Flg Lk Inf Al [ 0] NULL 00000000 000000 000000 00 0 0 0 [ 1] .interp PROGBITS 08048114 000114 000013 00 A 0 0 1 [ 2] .note.ABI-tag NOTE 08048128 000128 000020 00 A 0 0 4 [ 3] .hash HASH 08048148 000148 000028 04 A 4 0 4 [ 4] .dynsym DYNSYM 08048170 000170 000050 10 A 5 1 4 [ 5] .dynstr STRTAB 080481c0 0001c0 000059 00 A 0 0 1 [ 6] .gnu.version VERSYM 0804821a 00021a 00000a 02 A 4 0 2 [ 7] .gnu.version_r VERNEED 08048224 000224 000020 00 A 5 1 4 [ 8] .rel.dyn REL 08048244 000244 000008 08 A 4 0 4 [ 9] .rel.plt REL 0804824c 00024c 000008 08 A 4 b 4 [10] .init PROGBITS 08048254 000254 000017 00 AX 0 0 4 [11] .plt PROGBITS 0804826c 00026c 000020 04 AX 0 0 4 [12] .text PROGBITS 08048290 000290 000180 00 AX 0 0 16 [13] .fini PROGBITS 08048410 000410 00001b 00 AX 0 0 4 [14] .rodata PROGBITS 0804842c 00042c 000008 00 A 0 0 4 [15] .data PROGBITS 08049434 000434 00000c 00 WA 0 0 4 [16] .eh_frame PROGBITS 08049440 000440 000004 00 A 0 0 4 [17] .dynamic DYNAMIC 08049444 000444 0000c8 08 WA 5 0 4 [18] .ctors PROGBITS 0804950c 00050c 000008 00 WA 0 0 4 [19] .dtors PROGBITS 08049514 000514 000008 00 WA 0 0 4 [20] .jcr PROGBITS 0804951c 00051c 000004 00 WA 0 0 4 [21] .got PROGBITS 08049520 000520 000014 04 WA 0 0 4 [22] .bss NOBITS 08049534 000534 000004 00 WA 0 0 4 [23] .comment PROGBITS 00000000 000534 00007e 00 0 0 1 [24] .debug_aranges PROGBITS 00000000 0005b8 000058 00 0 0 8 [25] .debug_pubnames PROGBITS 00000000 000610 000025 00 0 0 1 [26] .debug_info PROGBITS 00000000 000635 00096e 00 0 0 1 [27] .debug_abbrev PROGBITS 00000000 000fa3 000124 00 0 0 1 [28] .debug_line PROGBITS 00000000 0010c7 0001ca 00 0 0 1 [29] .debug_str PROGBITS 00000000 001291 00065f 01 MS 0 0 1 [30] .shstrtab STRTAB 00000000 0018f0 00011e 00 0 0 1 [31] .symtab SYMTAB 00000000 001f38 000690 10 32 52 4 [32] .strtab STRTAB 00000000 0025c8 000321 00 0 0 1 Text section layout (descriptions found above): 0x8048290 <_start> 0x80482b4 0x80482e0 <__do_global_dtors_aux> 0x8048320 0x8048354
0x8048370 <__libc_csu_init> 0x80483a0 <__libc_csu_fini> 0x80483e0 <__do_global_ctors_aux> The main function is identical to the main function of the beta system. The __libc_csu_init and __libc_csu_fini functions clearly do not call any PIC register loading functions. The _init and _fini functions are identicle to the previous systems, as are call_gmon_star, frame_dummy and the constructor/deconstructor handlers. System Delta ------------ CPRU: AMD Duron 800MHz (i686) OPSY: Linux 2.4.21-grsec GCCV: 3.3.5 (Debian 1:3.3.5-2) LIBC: 2.3.2 mantra:~# readelf -S scenario There are 33 section headers, starting at offset 0x1e58: Section Headers: [Nr] Name Type Addr Off Size ES Flg Lk Inf Al [ 0] NULL 00000000 000000 000000 00 0 0 0 [ 1] .interp PROGBITS 08048114 000114 000013 00 A 0 0 1 [ 2] .note.ABI-tag NOTE 08048128 000128 000020 00 A 0 0 4 [ 3] .hash HASH 08048148 000148 000028 04 A 4 0 4 [ 4] .dynsym DYNSYM 08048170 000170 000050 10 A 5 1 4 [ 5] .dynstr STRTAB 080481c0 0001c0 000059 00 A 0 0 1 [ 6] .gnu.version VERSYM 0804821a 00021a 00000a 02 A 4 0 2 [ 7] .gnu.version_r VERNEED 08048224 000224 000020 00 A 5 1 4 [ 8] .rel.dyn REL 08048244 000244 000008 08 A 4 0 4 [ 9] .rel.plt REL 0804824c 00024c 000008 08 A 4 b 4 [10] .init PROGBITS 08048254 000254 000017 00 AX 0 0 4 [11] .plt PROGBITS 0804826c 00026c 000020 04 AX 0 0 4 [12] .text PROGBITS 08048290 000290 0001d0 00 AX 0 0 16 [13] .fini PROGBITS 08048460 000460 00001b 00 AX 0 0 4 [14] .rodata PROGBITS 0804847c 00047c 000008 00 A 0 0 4 [15] .data PROGBITS 08049484 000484 00000c 00 WA 0 0 4 [16] .eh_frame PROGBITS 08049490 000490 000004 00 A 0 0 4 [17] .dynamic DYNAMIC 08049494 000494 0000c8 08 WA 5 0 4 [18] .ctors PROGBITS 0804955c 00055c 000008 00 WA 0 0 4 [19] .dtors PROGBITS 08049564 000564 000008 00 WA 0 0 4 [20] .jcr PROGBITS 0804956c 00056c 000004 00 WA 0 0 4 [21] .got PROGBITS 08049570 000570 000014 04 WA 0 0 4 [22] .bss NOBITS 08049584 000584 000004 00 WA 0 0 4 [23] .comment PROGBITS 00000000 000584 000103 00 0 0 1 [24] .debug_aranges PROGBITS 00000000 000688 000078 00 0 0 8 [25] .debug_pubnames PROGBITS 00000000 000700 000025 00 0 0 1 [26] .debug_info PROGBITS 00000000 000725 000b45 00 0 0 1 [27] .debug_abbrev PROGBITS 00000000 00126a 000138 00 0 0 1 [28] .debug_line PROGBITS 00000000 0013a2 0002b7 00 0 0 1 [29] .debug_str PROGBITS 00000000 001659 0006df 01 MS 0 0 1 [30] .shstrtab STRTAB 00000000 001d38 00011e 00 0 0 1 [31] .symtab SYMTAB 00000000 002380 0006a0 10 32 52 4 [32] .strtab STRTAB 00000000 002a20 000489 00 0 0 1 Text section layout (descriptions found above): 0x8048290 <_start> 0x80482b4 0x80482e0 <__do_global_dtors_aux> 0x8048320 0x8048354
0x8048370 <__libc_csu_init> 0x80483d0 <__libc_csu_fini> 0x8048420 <__i686.get_pc_thunk.bx> 0x8048430 <__do_global_ctors_aux> All equivalent to the beta system. System Epsilon -------------- Just for fun... perhaps a glimpse into a future document? CPRU: P4 3GHz (i686) OPSY: 1.6.2_STABLE NetBSD GCCV: 2.95.3 20010315 (release) (NetBSD nb3) LIBC: 12.83.3 0x8048574 <_start> 0x804858c <___start> 0x8048660 <_rtld_setup> 0x80486f8