The story of exploiting kmalloc() overflows v.0.2.1
- qobaiashi
I. Intro
""""""""
Recently many kernel level heap (kmalloc) overflows have been discovered
which were rated "unclear" with regard to exploitation. This paper aims at
explaining the kernels heap management with security and exploiting heap
overflows in kernel space in mind.
II. The Slab allocator
""""""""""""""""""""""
The slab alocator serves the need for small, dynamicand and thus
fragmented memory portions in kernel space. Requested pages from the
Buddy system (0x1000 bytes) are further splitted into caches, holding
slabs, holding objects where objects are the smallest unit of memory -
pointed to by a returned kmalloc() pointer. To increase performance
the slab allocator saves more objects of the same size in so called slabs
where a just free'd object can - w/o touching the buddy system - be
given back to a new instance of a driver module which repeatedly allocates
objects of the same size for example.
II.I Caches
"""""""""""
+-------+--->+-------+----->+-------+
| CACHE | | CACHE | | CACHE |
| |<---| |<-----| |
+---+---+ +---+---+ +---+---+
|
V
+--------------------+
| SLAB | SLAB | SLAB |
|------|------|------|
| O | B | J |
|------|------|------|
| E | C | T |
|------|------|------|
| S | | |
+--------------------+
Fig. I.
"""""""
A list of all active caches is at
qobaiashi@cocoon:~> cat /proc/slabinfo <----
slabinfo - version: 1.1 (SMP)
kmem_cache 80 80 244 5 5 1 : 252 126
fib6_nodes 113 113 32 1 1 1 : 252 126
ip6_dst_cache 20 20 192 1 1 1 : 252 126
ndisc_cache 30 30 128 1 1 1 : 252 126
hpsb_packet 0 0 100 0 0 1 : 252 126
ip_fib_hash 113 113 32 1 1 1 : 252 126
clip_arp_cache 0 0 128 0 0 1 : 252 126
ip_mrt_cache 0 0 96 0 0 1 : 252 126
tcp_tw_bucket 30 30 128 1 1 1 : 252 126
tcp_bind_bucket 113 113 32 1 1 1 : 252 126
tcp_open_request 40 40 96 1 1 1 : 252 126
ip_dst_cache 20 20 192 1 1 1 : 252 126
arp_cache 30 30 128 1 1 1 : 252 126
blkdev_requests 1560 1560 96 39 39 1 : 252 126
nfs_write_data 0 0 384 0 0 1 : 124 62
nfs_read_data 0 0 352 0 0 1 : 124 62
nfs_page 0 0 96 0 0 1 : 252 126
ext2_xattr 0 0 44 0 0 1 : 252 126
kioctx 0 0 128 0 0 1 : 252 126
kiocb 0 0 96 0 0 1 : 252 126
eventpoll pwq 0 0 36 0 0 1 : 252 126
eventpoll epi 0 0 96 0 0 1 : 252 126
dnotify_cache 338 338 20 2 2 1 : 252 126
file_lock_cache 40 40 96 1 1 1 : 252 126
async poll table 0 0 144 0 0 1 : 252 126
fasync_cache 126 202 16 1 1 1 : 252 126
uid_cache 113 113 32 1 1 1 : 252 126
skbuff_head_cache 80 80 192 4 4 1 : 252 126
sock 216 216 1344 72 72 1 : 60 30
sigqueue 28 28 136 1 1 1 : 252 126
kiobuf 0 0 64 0 0 1 : 252 126
cdev_cache 531 531 64 9 9 1 : 252 126
bdev_cache 59 59 64 1 1 1 : 252 126
mnt_cache 59 59 64 1 1 1 : 252 126
inode_cache 20808 20808 512 2601 2601 1 : 124 62
dentry_cache 23010 23010 128 767 767 1 : 252 126
dquot 0 0 128 0 0 1 : 252 126
filp 1110 1110 128 37 37 1 : 252 126
names_cache 8 8 4096 8 8 1 : 60 30
buffer_head 32310 32310 128 1077 1077 1 : 252 126
mm_struct 48 48 160 2 2 1 : 252 126
vm_area_struct 1904 2408 68 43 43 1 : 252 126
fs_cache 59 59 64 1 1 1 : 252 126
files_cache 54 54 416 6 6 1 : 124 62
signal_act 51 51 1312 17 17 1 : 60 30
pae_pgd 113 113 32 1 1 1 : 252 126
size-131072(DMA) 0 0 131072 0 0 32 : 0 0
size-131072 0 0 131072 0 0 32 : 0 0
size-65536(DMA) 0 0 65536 0 0 16 : 0 0
size-65536 20 20 65536 20 20 16 : 0 0
size-32768(DMA) 0 0 32768 0 0 8 : 0 0
size-32768 3 3 32768 3 3 8 : 0 0
size-16384(DMA) 0 0 16384 0 0 4 : 0 0
size-16384 0 5 16384 0 5 4 : 0 0
size-8192(DMA) 0 0 8192 0 0 2 : 0 0
size-8192 5 10 8192 5 10 2 : 0 0
size-4096(DMA) 0 0 4096 0 0 1 : 60 30
size-4096 40 40 4096 40 40 1 : 60 30
size-2048(DMA) 0 0 2048 0 0 1 : 60 30
size-2048 20 20 2048 10 10 1 : 60 30
size-1024(DMA) 0 0 1024 0 0 1 : 124 62
size-1024 92 92 1024 23 23 1 : 124 62
size-512(DMA) 0 0 512 0 0 1 : 124 62
size-512 104 104 512 13 13 1 : 124 62
size-256(DMA) 0 0 256 0 0 1 : 252 126
size-256 75 75 256 5 5 1 : 252 126
size-128(DMA) 0 0 128 0 0 1 : 252 126
size-128 900 900 128 30 30 1 : 252 126
size-64(DMA) 0 0 64 0 0 1 : 252 126
size-64 3835 3835 64 65 65 1 : 252 126
size-32(DMA) 0 0 32 0 0 1 : 252 126
size-32 904 904 32 8 8 1 : 252 126
.
This list tells you the cache name, number of active objects, total object count, sizeof the
managed objects, number of objects inside a slab, pages per slab. Here you can see that the
kernel allocates special caches for filesystem drivers, network stuff etc. and general purpose
caches (size-*) suitable for DMA and for ordinary memory access. All caches are doubly linked
which makes traversing them easier in order to find a suitable object for allocation.
Every cache holds three linked lists of slabs for free, partially free and one for full slabs.
Additionally every cache has an array for each CPU pointing to free objects in the slabs, managed
in the LIFO way (just kfree'd objects should asap be given away again).
+-------+
| CACHE |
|-------| +-------+ +----------------+
| |------------------>| CPU_0 |--->| Arry w/ ptrs |
| | | CPU_N | | to unused objs |
| free |-->[ SLAB HEAD ] +-------+ | in slabs |
| | +----------------+
|partial|---------------------->+--------+---->+--------+---->+--------+---->+-
| |<----------------------| SLAB |<----| SLAB |<----| SLAB |<----|
| full |--[ SLAB HEAD ] | HEAD | | HEAD | | HEAD | |
| | +--------+ +--------+ +--------+ +
+-------+ | | | | | | |
| obj | | obj | | obj | |
. . . . . . .
. . . . . . .
Fig. II
"""""""
:
/*
* kmem_cache_t
*
* manages a cache.
*/
#define CACHE_NAMELEN 20 /* max name length for a slab cache */
struct kmem_cache_s {
/* 1) each alloc & free */
/* full, partial first, then free */
struct list_head slabs_full;
struct list_head slabs_partial;
struct list_head slabs_free;
unsigned int objsize;
unsigned int flags; /* constant flags */
unsigned int num; /* # of objs per slab */
spinlock_t spinlock;
#ifdef CONFIG_SMP
unsigned int batchcount;
#endif
/* 2) slab additions /removals */
/* order of pgs per slab (2^n) */
unsigned int gfporder;
/* force GFP flags, e.g. GFP_DMA */
unsigned int gfpflags;
size_t colour; /* cache colouring range */
unsigned int colour_off; /* colour offset */
unsigned int colour_next; /* cache colouring */
kmem_cache_t *slabp_cache;
unsigned int growing;
unsigned int dflags; /* dynamic flags */
/* constructor func */
void (*ctor)(void *, kmem_cache_t *, unsigned long);
/* de-constructor func */
/* de-constructor func */
void (*dtor)(void *, kmem_cache_t *, unsigned long);
unsigned long failures;
/* 3) cache creation/removal */
char name[CACHE_NAMELEN];
struct list_head next;
#ifdef CONFIG_SMP
/* 4) per-cpu data */
cpucache_t *cpudata[NR_CPUS];
#endif
#if STATS
unsigned long num_active;
unsigned long num_allocations;
unsigned long high_mark;
unsigned long grown;
unsigned long reaped;
unsigned long errors;
#ifdef CONFIG_SMP
atomic_t allochit;
atomic_t allocmiss;
atomic_t freehit;
atomic_t freemiss;
#endif
#endif
};
II.II Slabs
"""""""""""
As seen in Fig.I slabs hold objects of the same size. The head of each slab looks as
follows:
/*
:
struct list_head {
struct list_head *next, *prev;
};
typedef struct list_head list_t;
*/
:
typedef struct slab_s {
struct list_head list;
unsigned long colouroff;
void *s_mem;
unsigned int inuse;
kmem_bufctl_t free; //typedef unsigned int kmem_bufctl_t;
} slab_t;
Each header is located PAGE_SIZE aligned at the beginning of a (on-slab) slab. Every object in the
slab is sizeof(void *) aligned to increase access spead. After this header follows an array
containing an int value for every object. These values however are only important for
currently free objects and are used as an index to the next free object in the slab.
A value called BUFCTL_END (slab.c: #define BUFCTL_END 0xffffFFFF) marks the end of this array.
"couloroff" describes "offsetting the slab_t structure into the slab area to maximize cache
alignment." (slab.c)
It is also possible that the slab header is stored "off-slab" at an independent object. Due
to the *s_mem member of the slab_t struct it is unimportant where the slab head is stored because
it holds a pointer to the beginning of the objects of a slab. The decision for on or off-slab is
made in kmem_cache_create:
---8<---
/* Determine if the slab management is 'on' or 'off' slab. */
if (size >= (PAGE_SIZE>>3)) // if (size-requested >= 512)
/*
* Size is large, assume best to place the slab management obj
* off-slab (should allow better packing of objs).
*/
flags |= CFLGS_OFF_SLAB; //a special flag was set
---8<---
// If the requested object size is >= 512 bytes. BUT:
---8<---
/*
* If the slab has been placed off-slab, and we have enough space then
* move it on-slab. This is at the expense of any extra colouring.
*/
if (flags & CFLGS_OFF_SLAB && left_over >= slab_size) {
flags &= ~CFLGS_OFF_SLAB;
left_over -= slab_size;
}
---8<---
If the header fits into the allocated slab space then store it on-slab!!
Exploitation note:
""""""""""""""""""
as we are writing towards upper memory addresses it is NOT possible to overwrite
the current (previously written) slab header:
<--0x00 .-overflowing obj 0xff ->
[SLAB HEADER][COLOUR][obj1 ][ojb 2][ctrl_][aaaaa][aaaaa][aaaaa][aaaaa]
but other - after "us" allocated objects inside of our slab. Here we must
keep in mind that the kernel tries to balance the load between all possible slabs,
so all "partial" slabs should have the same count of inuse objects.....
the problem is that we can only overflow *random* other objects that have been
allocated by other kernel routines after our bug-trigger.
randomness or increasing reliability:
"""""""""""""""""""""""""""""""""""""
by carefully consuming objects of a certain cache you can - with the above stuff in mind -
quite good control what object comes next to our overflowing object and thus quite reliable
exploit the use of this other allocated object...
..got it.. use your brains!
it's time to find kernel functions you can triger from userspace that allocate objects of
the cache in which your overflow is in.
!and erm successful exploitation is actually possible!
have fun!
-q