I discovered a neat little trick on Linux: on x86 (and a few other less common architectures), it’s possible to determine from an unprivileged process whether an address residing within the kernel address space is mapped or unmapped.
The top-level page fault handler on x86 is do_page_fault()
, found in arch/x86/mm/fault.c
. When the CPU fires a page fault exception, it pushes an error code onto the stack, which is accessible as an argument to the page fault handler.
When a userland process attempts to access unmapped memory or memory whose page permissions do not allow the desired type of access, the following code path is invoked:
do_page_fault()
__do_page_fault()
bad_area_nosemaphore()
__bad_area_nosemaphore()
show_signal_msg()
This last function prints a message to the kernel syslog with information about the uncaught SIGSEGV
that is thrown as a result of the invalid memory access:
static inline void show_signal_msg(struct pt_regs *regs, unsigned long error_code, unsigned long address, struct task_struct *tsk) { if (!unhandled_signal(tsk, SIGSEGV)) return; if (!printk_ratelimit()) return; printk("%s%s[%d]: segfault at %lx ip %p sp %p error %lx", task_pid_nr(tsk) > 1 ? KERN_INFO : KERN_EMERG, tsk->comm, task_pid_nr(tsk), address, (void *)regs->ip, (void *)regs->sp, error_code); print_vma_addr(KERN_CONT " in ", regs->ip); printk(KERN_CONT "\n"); }
Note that the error_code
printed to the syslog has been passed down all the way from the top-level page fault handler. It’s worth taking a look at what the bits of this error code correspond to. Most importantly, bit 0 is the Present flag, indicating whether or not the page the process was trying to access is present at all. Bit 1 is the Read/Write flag, indicating whether this was a read or write fault, and bit 2 is the User/Supervisor flag, which is 0 when the fault was caused by a supervisory process, and 1 if the fault was caused by a user process.
In other words, regardless of whether the attempted access resides in user or kernel space, the error code logged to the syslog indicates whether the address corresponds to a present or absent page. This can be verified as follows:
$ cat trick.c #include <stdlib.h> int main(int argc, char **argv) { int *ptr, foo; ptr = (int *)strtoul(argv[1], NULL, 16); foo = *ptr; } $ ./trick ffffffff81aa3690 Segmentation fault (core dumped) $ ./trick ffffffffc1aa3690 Segmentation fault (core dumped) $ dmesg | grep segfault [391396.756467] trick[31865]: segfault at ffffffff81aa3690 ip 0000000000400528 sp 00007fff7c026ba0 error 5 in trick[400000+1000] [391404.736606] trick[31872]: segfault at ffffffffc1aa3690 ip 0000000000400528 sp 00007fff170fac60 error 4 in trick[400000+1000]
The first invocation deliberately causes an access violation on a mapped kernel address, resulting in an error code of 5 (a read violation from user mode on a present page). The second invocation causes an access violation on an unmapped kernel address, resulting in an error code of 4 (a read violation from user mode on a non-present page).
This trick is only possible if you can read the syslog in the first place, so the dmesg_restrict
sysctl must be disabled.
This entry was posted on Wednesday, February 6th, 2013 at 6:23 pm and is filed under Kernel, Linux. You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.