My name is Nicolas Economou and I am a senior member on the Exploit Writing Team here at CORE Labs - specializing in Windows kernel exploitation - where we work tirelessly to discover vulnerabilities within countless technologies so we can provide our customers with new tools to test and assess their networks.
In this post, I'm going to share a deeply technical explanation regarding the challenging work involved in exploiting a Windows-based vulnerability I discovered (CVE-2012-0181) and how it was exploited within Windows 2003, Windows Vista and Windows 2008. Although this is a first-time publication of these findings, you might be interested to know that the attack vector was the same used by the infamous Stuxnet worm to exploit a bug (MS10-073 and CVE-2010-2743).
(NOTE: The results of this research are included in a CORE Labs Advisory published earlier this week entitled: "Windows Kernel ReadLayoutFile Heap Overflow: CORE-2011-1123".)
The syntax for this function:
HKL WINAPI LoadKeyboardLayout(__in LPCTSTR pwszKLID,__in UINT Flags
Here, the string 'pwszKLID' holds the identity of the keyboard to be loaded (e.g., the English layout is '00000409') and the parameter 'Flags' specifies how the input locale identifier is going to be loaded.
The 'LoadKeyboardLayout function', located in 'USER32.DLL', is only the highest-level interface to load a new keyboard layout. However, the function is actually a wrapper for the wind32k system call (syscall for short) 'NtUserLoadKeyboardLayoutEx'. Accessing this syscall is straight-forward, and in Windows 2003 can be done with the following code:
Depending of the OS version, the system call accepts seven or eight parameters, whereas 'LoadKeyboardLayout' accepts only two. Of these additional parameters, there are three that play a role in this vulnerability:
- a handle of an opened keyboard layout file
- an index into the file
When the syscall is wrapped by 'LoadKeyboardLayout', these parameters are set before entering into kernel mode. In particular, the handle to a "keyboard layout file" is set when the 'pwszKLID' parameter passed to 'LoadKeyboardLayout' is converted to string with the name of a keyboard layout file (generally 'kbdXX.dll'), and then this file is opened. The index argument is an offset into the keyboard layout DLL passed as a handle.
One can ask what would happen if the 'NtUserLoadKeyboardLayoutEx' function (the syscall) is called by us instead of calling it through 'LoadKeyboardLayout'? We could then pass the handle we want and the index value we want... but does this have any security implications?
NtUserLoadKeyboardLayoutEx - parameters
In general, the keyboard layout files are DLLs that export only one function ('KbdLayerDescriptor') and which are located in the "windows\system32" directory. On the other hand, index is a pointer relative to the DLL’s base and it is used to refresh a pointer table ('LayerDescriptor') located in the kernel’s .DATA section.
Windows XP and the other OSs
After the release of XP, Microsoft introduced some changes in how the syscall operates.
- The file location parameter (handle) isn't checked (e.g., this means we can pass a file of our design, and in particular, a rootkit).
- The index into the file isn't checked (e.g., this means we can pass an index out of the range used by the file mapped in kernel memory where it is easier to write attacker-generated code).
The rest of the OSes:
- The file location parameter (handle) is checked (i.e., it must be located in "windows\system32").
- The index into the file is checked (i.e., it must be in the bound of the kernel memory mapped at the .DATA section).
Hence, exploitation for XP becomes easier. Next, we’ll explain how to treat the latter cases. Namely: how can we take advantage of this syscall in the cases of non-XP operating systems with the above restrictions?
The bug is located in the 'ReadLayoutFile' function, which is part of the 'win32k.sys' kernel module. This function has a custom-built PE loader. When the index parameter is used by the PE loader, the bounds are checked as follows:
- 'base_addr' is the kernel memory address where the .DATA section was copied.
- index is the relative offset, starting from the 'base_addr', where the keyboard layout descriptor is located.
- 'limit_addr' is the kernel memory address where the .DATA section ends.
- 'LayerDescriptor' is a table consumed by the PE loader with 12 or 13 pointers to the different functions in the DLL. These pointers must be refreshed using 'base_addr'.
Now, what happens if 'base_addr + index' is set with 1, 2 or 3 less than the 'limit_addr'? (Using a bigger value may cause other checks to fail, so it is not advised.) For example:
If this pointer is in the bounds of the keyboard layout, the "PE loader" continues with the same process, until the table is refreshed (12 or 13 pointers). If this pointer is out of the bounds, the function 'ReadLayoutFile' deallocates the keyboard layout and returns ERROR.
Since we control index, the range checked is insufficient, so the bug could be used to overwrite one, two or three bytes out of the memory allocated by windows. This kind of bug is called an off-by-one bug, or more generically: memory corruption.
A little bit of theory first
This picture depicts a snapshot of the paged memory pool in the Windows kernel heap:
Each of the pages in the paged pool represents a memory block. Each block holds 4096 bytes (4 kB). When writing into the paged memory tool (e.g., "mallocs" executed by the Windows kernel itself), one necessarily takes the whole block into memory.
The paged (memory) pool is a memory resource that the OS uses to store data: “The kernel’s pool manager operates similarly to the C-runtime and Windows heap managers that execute within user-mode processes. Because the minimum virtual memory allocation size is a multiple of the system page size (4KB on x86 and x64), these subsidiary memory managers carve up larger allocations into smaller ones so that memory isn’t wasted.” When low in (virtual) memory, the system can thus write these pages to disk (physical) and free some space.
Let’s say we did use the 'NtUserLoadKeyboardLayoutEx' syscall; so that the .DATA section (of the keyboard layout) was allocated as one of the chunks depicted in Figure 1 (here the keyboard layout was mapped in a kernel memory address represented by a painted square):
Each memory chunk is depicted as a rectangle. The first eight bytes of each chunk contain a header that we represent by line separators:
In turn, the chunk header has the following structure:
The first half of the chunk header consists in two pointers:
- The field previous, holds an "unsigned short" variable (16 bits). To obtain the chunk header previous to itself, one needs to multiply this number times 8.
- The field next, holds an "unsigned short" variable (16 bits). To obtain the chunk header after itself, one needs to multiply this number times 8.
We note that actually only 9 bits are used to represent the chunk size, the remaining 7 bits are used as an index which determines the block which they belong to.
After describing the setting we are ready to get into the most complex and interesting part: the exploitation. This will take a few steps and I’ll describe solving the several challenges as they appeared to me.
Step 1: HEAP Corruption - Part 1
As you may recall, using 'NtUserLoadKeyboardLayoutEx' we may overwrite 1, 2 or 3 bytes at the end of the chunk. Recall that at the end of each chunk, there follows a new chunk, that this chunk starts with a header, and that the first two bytes in this header are used to locate the previous chunk.
For example, if we managed to overwrite the header of the chunk after the keyboard layout file, then the previous field of this chunk’s header should point to the keyboard layout allocation:
Here, the red arrow represents what the bug allows us to overwrite.
Step 2: HEAP Corruption - Part 2
Let's analyze what is the impact of overwriting the first byte of the next chunk header. Consider, for example, that we are using a keyboard layout file of 0x178 bytes in size (so the associated chunk containing the file plus the header is 0x180 bytes in size)
Since 0x180 (384 decimal) divide by 8 is 0x30, there follows that the previous field should hold the value 0x30 (or 48 decimal). Suppose that we overwrote the first byte with another value, say 0x10, then, the previous field header would point to some place within the keyboard layout chunk:
Obviously this is an unexpected behavior. According to this assumption, a new chunk would start at -0x80 with its own header:
Step 3: "Exploiting" the HEAP
If we overwrite the previous field on the next chunk header, two very different things can happen depending on what chunk is deallocated next (memory will be freed sometime after the chunk is used):
- If the keyboard layout chunk is deallocated (calling 'ExFreePoolWithTag' function), the Windows kernel produces a blue screen of death (BSoD) because the check that compares the size of the keyboard layout chunk, computed from the next field header in the keyboard layout chunk, with the value of the previous field header in the next chunk will fail and thus produce the blue screen.
- If the next chunk (one of the alloc chunks in Figure 8 ) is deallocated, a very interesting effect takes place: The 'ExFreePoolWithTag' function reads the previous field (that we have already overwritten) and obtains the chunk header of our choice. This effect is known as heap coalesce.
When the chunk is deallocated, the memory looks like this:
So, if we controlled the memory area content where the fake header is read, we could control the value of these 2 pointers (A and B).
The Big Problem
We can break the exploitation problem in the following three sub-problems.
First: When deallocation happens, we need to avoid the keyboard layout chunk so that the next chunk is deallocated. The former situation would lead to a BSoD and end our hope for getting the exploit to work. So we need to somehow provoke for the next chunk to be deallocated.
Second: After doing the off-by-one, we do not know the value of the first byte in the NEXT chunk header. We need to estimate this value so that we know where it will point.
Third: We don't control the (keyboard layout) DLL content, so where will we write our payload?
The implication of this is that we can't create a fake chunk header pointing into the DLL area, because this file is not writable. Hence, exploitation will have to go through some other means. On the other hand, we might be able to find, somewhere in the .DATA area, data consistent with a chunk header followed by two pointers and then have the byte used to overwrite the next chunk header to point to this (fake) chunk header.
Solving problems 2 and 3
Assume for now that we have solved problem 1; we’ll figure it out next. Then, problem 2 has a neat and interesting solution, which goes as follows. We know that the highest byte of the refreshed pointer is the first byte that we used to overwrite the next chunk header. We also know that the address is pointing somewhere in the kernel, and therefore this value must be bigger than 0x80000000. Hence, we can deduce that the highest byte will hold a value between 0x80 and 0xff.
For the heap coalesce, it is sufficient to control the memory area before the keyboard layout file and build the fake chunk header and the two pointers within that memory area. Assuming we can control allocations before the keyboard layout, we face the following situation:
To solve problem 3 note that we needn't overwrite the DLL content, because if it is sufficient to find a DLL with a .DATA section located less than 0x400 bytes away. This is because the minimum byte value used to overwrite the next chunk is 0x80 (128 decimal), and 0x80 times 8 is 0x400 (1024 decimal). So that when the 'ExFreePoolWithTag' function finds the fake chunk header, it would be out of the keyboard layout chunk.
The Big Trick
Problem 1 is much more difficult because there isn’t a way to avoid the deallotation of the KEYBOARD LAYOUT CHUNK. The solution is achieved using a neat surgically-crafted heap spray technique. Say that the KEYBOARD LAYOUT is allocated exactly at the end of the memory page like this:
If this happens, the overwritten next chunk header is allocated in the next memory page (next 4 kB).
In this case, the keyboard layout chunk will be inevitably deallocated, but Windows will not do the sanity check since it is not needed.
In this way we can avoid the BSoD as 'ExFreePoolWithTag' function deallocates the (fake/corrupted) next chunk. Since the next chunk header starts exactly at the beginning of the next memory page, the previous field value should be 0.
Even though the chunk is located at the start of the page, the 'ExFreePoolWithTag' function will use the previous field to find a previous chunk header. So, this innocent-looking check allows us to use a fake chunk header and point to the previous memory page, the page where the keyboard layout was allocated.
This is illustrated by Figure 13:
As a result, we can use POINTER 1 and POINTER 2 to overwrite a memory address with a specific value. Once a specific memory address was written, there are many ways to take control of the Windows kernel, for example, by overwriting the Service Description Table (SDT) table. That may be known by many, and for the rest, it is a story for some other time.
It is worth noting that this technique will not work in Windows 7 because some additional checks will make the heap coalesce technique fail.
- Nicolas Economou, Senior Exploit Writer, CORE Labs