Getting Physical: Extreme abuse of Intel based Paging Systems - Part 3 - Windows HAL's Heap

August 25, 2016

Continuing with my Getting Physical blog posts series (CanSec2016’s presentation), in this third episode I’m going to talk about how Windows Paging is related to the HAL's heap and how it can be abused by kernel exploits.

This is probably the simplest way of abusing Windows paging structures, because deep knowledge about how Intel paging works is not necessary to implement the attack.

"Windows 10" Anniversary Update - notes

After installing and testing the "Windows 10" 64 bits Anniversary Update (version 1607) , I can confirm what Microsoft announced in its presentation "Windows 10 Mitigation Improvements" in BH-USA-2016 and that Windows has started to randomize Paging Tables.

To confirm that, I wrote a "test.exe" program that contains a breakpoint in the "main" function.
When this program is executed, a breakpoint is hit in the kernel debugger.

Before, when the "!pte eip" command was executed in Windows 10 "version 1511" (TH2), the result was the following:

w10-old-pxe

Now, in Windows 10 "version 1607", the result is different:

w10-new-pxe

It's clear that something has changed and that the Windows kernel debugger is not able to read the PAGE TABLE memory address located at 0xFFFFF6BF'FBBA0088.

It means that the paging tables that map the code pointed by the instruction pointer are not present; at least, not in the usual addresses.

Now, and as we proposed in our presentation "Windows SMEP bypass: U=S" given in October 2015 at Ekoparty with @KiqueNissim

ekoparty-slide

The only way to randomize paging tables that use the self-referential technique is by randomizing the PML4 self-ref entry position.
In the case of Windows 64 bits versions, it's located in the position 0x1ED.

If the PML4 self-ref entry is moved +/-1 position, the difference will be +/- 512GB (0x80'00000000) from the original paging table entry. Knowing that, I started to manually look for the PTE that maps RIP, and the result was this:

w10-new-pxe2

It means that, after rebooting the virtual machine, Windows has chosen the position 0x1ED minus 0xE6 (0x107) as self-ref entry.

This behavior proves that Windows has stopped using the position 0x1ED in the PML4, and now, it changes after every reboot.

To be clear, from NOW on it's no longer possible to use arbitrary writes, in a 100% reliable way, against the Windows 10 paging tables, because the probability of hitting the right one is near to 1/256 (256 kernel entries).

Windows HAL's Heap

When Windows is booting, one of the first modules to be loaded is HAL.DLL (Hardware Abstraction Layer).

This module, that runs in kernel mode, is used to abstract the Windows kernel from basic hardware like APIC, I/O PORTS, etc.

In this way, the Windows kernel is able to interact with different architectures by calling to the same HAL.DLL exported functions.

To run, HAL.DLL needs stack and heap, like the 99.99% of the modules written for Windows.

The most interesting thing can be seen in the HEAP side, because it's created by HAL.DLL during the booting process.

What is really interesting here is that, this HEAP is ALWAYS mapped at the same virtual address, at least since Windows 2000.

This attack vector was mentioned some time ago in a very interesting presentation called "Bypassing kernel ASLR - Target: Windows 10 (remote bypass)".

The virtual address used for the HAL's heap initial address is:
- Windows 32 bits - 0xffd00000
- Windows 64 bits - 0xffffffff’ffd00000

Here, we can see a table with the latest 64-bit Windows versions:

hal-list

If we look at the right column ("Physical Address"),  we can see that the physical address used by the HAL's heap is FIXED, and this one changes depending on the Windows version.

Both for Windows 8.1 and Windows 10 (including "Anniversary Update"), the PHYSICAL address used by the virtual address 0xFFFFFFFF’FFD00000 is 0x1000 (PFN 0x1).

This means that, both PHYSICAL and VIRTUAL addresses are NOT randomized !

Now, the HAL's heap became really interesting since Windows 8, when it started to contain a function pointer list called "HalpInterruptController".

This list is exported by HAL.DLL and can be found by using the next command:

dq poi(hal!HalpInterruptController)

Let's see an example:

w81-hal

One of the most interesting pointers, at least in the 64 bits versions, is “hal!HalpApicRequestInterrupt”, located in the position 15 (offset 0x78) of this table:

w81-hal-2

This function pointer is used all the time by the Windows kernel, which means that if we overwrite it, we will get controlled execution quickly.

It's important to say that, depending on the Windows version and the target configuration (number of CPUs), this function pointer list tends to be kept in the same virtual address, and that it can be overwritten by a simple arbitrary write, because it's mapped as writable.

So, if it's kept in a fixed virtual address, it means that it can be abused by LOCAL and REMOTE exploits.

Testing Windows Paging

Before we start listing the most useful arbitrary write cases against Windows paging structures, let's do a little demo.

I created a program called "test2.exe" with the following code:

#include <windows.h>
#include <stdio.h>

main ()
{
 void *p = 0x1000000;

// Allocating memory
 p = VirtualAlloc ( p, 0x1000, MEM_RESERVE|MEM_COMMIT, PAGE_EXECUTE_READWRITE );
 printf ( "p = %llx\n", p );

// Setting memory
 memset ( p, 0x41, 0x1000 );

// Hitting breakpoint
 __debugbreak ();
}

When this program is executed, we hit a breakpoint in the Windows kernel debugger.

Knowing that this program allocates memory at 0x1000000 (RAX value), let's see its content:

test2-1

Now, if we execute the command "!pte rax" to obtain the PAGE TABLE entry used to map this address:

test2-2

In the screenshot above, we can see that the PTE used to map the virtual address 0x1000000 is at the PHYSICAL address 0x23FD2000 (lowest 12 bits are ignored).

Here comes the interesting part. To simulate an arbitrary write, let's use the same kernel debugger to overwrite the physical address used by this PTE with the physical address of the HAL's heap (0x1000 address)

test2-3

Now, we trace one instruction to refresh the TLBs and then, we dump the content of 0x1000000 ("rax" register).
We can see that the HAL's heap has been mapped in this address, which means that, now we can read/write/execute this memory area from USER SPACE :-)

test2-4

To confirm that we are really seeing the HAL's heap, let's check if we are able to read the "hal!HalpInterruptController" function pointer table.

test2-5

Effectively, when the offset 0x4a0 is added to the memory page mapped at 0x1000000, we can see this table.

Let's do the last test, let's overwrite the address 0x1000000+0x518 (“hal!HalpApicRequestInterrupt” pointer offset) with the value 0x41414141'41414141:

test2-6

and let's continue with the normal execution ...

test2-7

ops! ... this function pointer has been used by the Windows kernel almost instantly :-)

Shooting all valid ways of kernel arbitrary writes

Depending on the kind of arbitrary write that we have, either fully controllable or not, and what it allows us to do like writing one byte, one word, one dword or one qword, we will see that it's possible to find a way to map the heap created by HAL.DLL in USER SPACE !

Let's start by understanding that, when we call VirtualAlloc to allocate memory, the Windows kernel usually creates PAGE TABLE entries, and depending on the virtual address allocated by this function, it could be necessary to create more entries in higher paging levels like PML4 or PDPT.

For example, if we want to allocate 4KB at 0x1000000, we would do something like that:

VirtualAlloc ( 0x1000000, 0x1000, MEM_RESERVE|MEM_COMMIT, PAGE_EXECUTE_READWRITE );

After allocation, if we execute the command "!pte 0x1000000" in KD, we can see the PTE created by the kernel:

pte-overwrite-1

Now, if we look at the consecutive PTEs,  we can see that all of them are empty:

pte-overwrite-2

It makes sense, because only 4KB have been mapped at virtual address 0x1000000, the rest are unused.

Now, what would happen if we use an arbitrary write to create our own entries ?

- Using arbitrary writes to create new paging entries

The trick here is to make sure that the PAGE TABLE where we want to point our arb.write is present.

The best way to do that is by allocating 4KB in an unused 2MB memory range.

This is to make sure that a new PAGE TABLE with only a single used entry will be created.

The most interesting thing here is that, as the Windows kernel doesn't use the empty entries nor checks them, they are ignored.

So, it's possible to create spurious entries during Windows kernel exploitation (or post exploitation) without the knowledge of the MMU, and when the exploitation process finishes, no BSoD will appear ... ;-)

- Using 2-byte/4-byte/8-byte arbitrary writes

Let's start with the simplest case, where we have a kernel bug that allows us to write a word, dword or qword where we want.

In the following examples, all our arbitrary writes will be used to map the HAL's HEAP in user space by creating a new paging entry.

To create a PTE that allows us to do this it's necessary to have an arb.write like this:

- "67 10" 00 00 00 00 NN NN

The value 0x67 means DIRTY + ACCESSED + USER + WRITABLE + PRESENT.

It's not really necessary to set the DIRTY and ACCESSED flags, because they are set by the CPU in runtime, so we could use the value 0x07 instead of the 0x67.

The XX 1X 00 00 value represents the PFN number 1, which is equal to the physical address 0x1000.

Now, let's simulate a 2-byte arbitrary write by using the kernel debugger:

pte-overwrite-4

We can see that before overwriting the PTE that maps the virtual address 0x1001000, it wasn't possible to read the constant from this address.
After overwriting this empty entry, we got access from USER SPACE to the HAL's heap.

Now, if we only control the highest part of our arbitrary write, we could overwrite two consecutive empty-entries like this:

     PREVIOUS PTE | NEXT PTE
NN NN NN NN NN NN | "67 10"

In this way we get the same result that above.

- Using 1-byte/2-byte/4-byte/8-byte arbitrary writes

In this case, we are going to analyze the most interesting paging tables arbitrary write where, instead of creating a PAGE TABLE entry, we are going to create a PAGE DIRECTORY entry.

For this one, we will use our arb.write to create a LARGE PAGE, and we will use 0x00 (NULL address) as the physical address.

To create a LARGE PAGE, we need to turn on the PS bit (Page Size):

pxe-format

If this physical address points to the NULL address, it means that this PDE is mapping the 0~2MB memory range, which includes the HAL's heap.

This is really cool because we are able to do that by using a single 1-byte arbitrary write :-D

The arbitrary write that we need to use should be like that:

- "e7" 00 00 00 00 00 NN NN

Or, if we only control the highest part, we should overwrite 2 consecutive empty-entries like that:

        PREVIOUS PDE | NEXT PDE
NN NN NN NN NN NN NN | e7

In the same way as PTEs, it's necessary to make sure that the PAGE DIRECTORY that we are going to overwrite is present.

Knowing that a PAGE DIRECTORY TABLE can address up to 2MB * 512 entries (1GB), it's advisable that our memory allocation creates a PAGE DIRECTORY away from memory areas previously allocated.

To do this, the best option is to allocate in the first 4KB of memory addresses that are multiples of one gigabyte, like 1GB, 2GB, nGB.

In this example, I'm going to use the virtual address 1 GB.

VirtualAlloc ( 1024*1024*1024*1, 0x1000, MEM_RESERVE|MEM_COMMIT, PAGE_EXECUTE_READWRITE );

After allocation, we can see that when we read the PAGE DIRECTORY table, we see the entry created by Windows:

pde-1

At the same time, we can see that the following entries are empty, so it means that there is no virtual memory mapped in 1GB+2MB, 1GB+4MB, etc.

So, it's easy to deduce that we are going to use our arb.write against one of them.

For our example, I'm going to use the first empty PDE, which maps the range 1GB+2MB~1GB+4MB.

pde-2

In the above screenshot, we can see that there is not memory mapped at 1GB+2MB (0x4020000), and that its PDE is obviously empty.

Now, let's simulate a 1-byte arbitrary write, an then let's read the content of the address 1GB+2MB (0x4020000):

pde-3

Once the arb.write was used to create a LARGE PAGE, we can see that when we read at 0x4020000, we are seeing the very old and famous IVT (Interrupt Vector Table) located at the physical address 0x00 (PFN 0) :-D

If we move at offset 0x14A0, we can see the "hal!HalpInterruptController" table:

pde-4

It's important to say that we could allocate in any virtual address, and then we could find an empty PDE to do exactly the same.

I used an empty PAGE DIRECTORY TABLE to simplify the explanation.

- Decrementing a memory address in one unit

This is one of my favorites arb.writes, because the ability of the exploit writer can be seen in all different forms.

In general, we find these kind of scenarios when we have an UAF (Use After Free), usually in win32k.sys.

The instruction that allows to do that is almost always:

- dec [reg+0xNN]

Some time ago, one of the simplest ways to abuse this kind of arbitrary writes was by using Cesar Cerrudo's "Second trick", where the SEP_TOKEN_PRIVILEGES field of the PROCESS TOKEN structure was decremented by 1.

With this trick, it was possible to enable some special bits which allows us to get some privileges like injecting code in system processes like "lsass.exe".

Since Windows 8.1, all processes running in Low Integrity Level are no longer able to get TOKEN PROCESS addresses by calling NtQuerySystemInformation with the "SystemInformationClass" parameter equal to "SystemHandleInformation".

As a replacement of this technique, we can use our arbitrary decrement to create a LARGE PAGE :-)

Reviewing the last technique explained above, to create a PAGE DIRECTORY entry that works as LARGE PAGE it's necessary to use the value 0xE7:

0xE7 = PAGE SIZE (4KB) + DIRTY + ACCESSED + USER + WRITABLE + PRESENT

Now, if we use the "dec" instruction to decrement, in a shifted way, two consecutive PAGE DIRECTORY empty entries like this:

        PREVIOUS PDE | NEXT PDE
FF FF FF FF FF FF FF | FF 00 00 00 00 00 00 00

we will turn on all the PDE bits resulting in the next one:

0xFF = PAGE SIZE (4KB) + DIRTY + ACCESSED + PCD (PAGE-LEVEL CACHE DISABLED) + PWT (PAGE-LEVEL WRITE THROUGH) + USER + WRITABLE + PRESENT

Fortunately, this bits combination works, allowing us to create a LARGE PAGE that uses the physical address 0x00 (NULL address ) !

Simulating a shifted decrementation in the PDEs that map the virtual address range 1GB+4MB ~ 1GB+6MB, we would see something like that:

pde-5

After decrementation, the last 0xFF value maps a LARGE PAGE that starts at 0x00 physical address and finishes at 2MB physical address, where the whole HAL's heap is contained :-D

pde-6

Although I have only mentioned the most common arb.write cases, it's important to point out that we could use other variants of them, because it's only necessary to turn on some specific bits to create valid entries.

Special comments

- SMEP bypass

After using an arb.write, we will have access to the HAL's heap from user space with read/write privileges.

Now, when the “hal!HalpApicRequestInterrupt” function pointer is modified, we are able to jump wherever we want.

There is a problem here, because at the same time that the HAL.DLL function pointer list was introduced by Windows 8, Windows started to support SMEP (Supervisor Mode Execution Prevention), which forces us to bypass it when this is supported by the target.

The best way to do that is by ROPing to HAL.DLL, because its base address can be calculated by reading the function pointers contained by this table.

Some of the ways to do that were explained in this presentation.

- Process context after overwriting HAL's HEAP function pointers

It's VERY IMPORTANT to say that, when this function pointer is overwritten, the control of RIP won't be taken necessarily by our process context, because this is used all the time by the Windows kernel, independently of the current process.

It complicates much more the exploitation when a ROP-Chain is used, because the addresses of the ROP gadgets have to be contained by stack created by the exploit itself.

As a simple solution for this problem, after overwriting the “hal!HalpApicRequestInterrupt” function pointer, we could consume the 100% of the CPU by using a simple "while(1)", and then just wait for the Windows kernel invocation.

To be continued …

 

For more information on our Core Labs team and what they are working on, visit our Services Page and see how our services can work for your organization.

  • Latest from CoreLabs

Suggested reads

Ready for a Demo?

Eliminate identity-related breaches with SecureAuth!