Decode segfault errors in dmesg

Christophe Vu-Brugier

2014-06-14

You are writing a C program. You build it. You run it.

$ ./foo
Segmentation fault

The machine hardly reminds you that you are human. But before rushing to re-compile your program with debugging symbols or adding printf() calls here and there, have a look at the output of the Linux kernel:

$ dmesg
foo[1234]: segfault at 2a ip 0000000000400511 sp 00007fffe00a3260 error 4 in foo[400000+1000]

There are some hints in the output of dmesg:

foo is the executable name
1234 is the process ID
2a is the faulty address in hexadecimal
the value after ip is the instruction pointer
the value after sp is the stack pointer
error 4 is an error code
the string at the end is the name of the virtual memory area (VMA)

The error code is a combination of several error bits defined in fault.c in the Linux kernel:

/*
 * Page fault error code bits:
 *
 *   bit 0 ==    0: no page found       1: protection fault
 *   bit 1 ==    0: read access         1: write access
 *   bit 2 ==    0: kernel-mode access  1: user-mode access
 *   bit 3 ==                           1: use of reserved bit detected
 *   bit 4 ==                           1: fault was an instruction fetch
 *   bit 5 ==                           1: protection keys block access
 *   bit 6 ==                           1: shadow stack access fault
 *   bit 15 =                           1: SGX MMU page-fault
 */
enum x86_pf_error_code {
        X86_PF_PROT     =               1 << 0,
        X86_PF_WRITE    =               1 << 1,
        X86_PF_USER     =               1 << 2,
        X86_PF_RSVD     =               1 << 3,
        X86_PF_INSTR    =               1 << 4,
        X86_PF_PK       =               1 << 5,
        X86_PF_SHSTK    =               1 << 6,
        X86_PF_SGX      =               1 << 15,
};

Since you are executing a user-mode program, X86_PF_USER is set and the error code is at least 4. If the invalid memory access is a write, then X86_PF_WRITE is set instead. Thus:

If the error code is 4, then the faulty memory access is a read from user space.
If the error code is 6, then the faulty memory access is a write from user space.

Moreover, the faulty memory address in dmesg can help you identify the bug. For instance, if the memory address is 0, the root cause is probably a NULL pointer dereference.

The name of the VMA may indicate the location of the error:

#include <stdlib.h>

int main(void)
{
        free((void *) 42);
        return 0;
}

When executed, the program above triggers a segfault and the VMA name is libc. Maybe a function from libc was called with a pointer no longer valid.

progname[1234]: segfault at 22 ip 00007f6b2531473c sp 00007ffc7b2c5c30 error 4 in libc-2.31.so[7f6b252af000+14b000]

The fault handler is architecture dependent, so you will not observe the same messages in dmesg with other architectures than x86. For instance, on ARM no message is displayed unless the Linux kernel has been built with CONFIG_DEBUG_USER.

A ~~64-bit~~ 64-beam architecture
Fondation Louis Vuitton