Decode segfault errors in dmesg
You are writing a C program. You build it. You run it.
$ ./foo Segmentation fault
The machine hardly reminds you that you are human. But before rushing
to re-compile your program with debugging symbols or adding printf()
calls here and there, have a look at the output of the Linux kernel:
$ dmesg foo[1234]: segfault at 2a ip 0000000000400511 sp 00007fffe00a3260 error 4 in foo[400000+1000]
There are some hints in the output of dmesg
:
-
foo
is the executable name -
1234
is the process ID -
2a
is the faulty address in hexadecimal - the value after
ip
is the instruction pointer - the value after
sp
is the stack pointer -
error 4
is an error code - the string at the end is the name of the virtual memory area (VMA)
The error code is a combination of several error bits defined in fault.c in the Linux kernel:
/* * Page fault error code bits: * * bit 0 == 0: no page found 1: protection fault * bit 1 == 0: read access 1: write access * bit 2 == 0: kernel-mode access 1: user-mode access * bit 3 == 1: use of reserved bit detected * bit 4 == 1: fault was an instruction fetch * bit 5 == 1: protection keys block access * bit 6 == 1: shadow stack access fault * bit 15 = 1: SGX MMU page-fault */ enum x86_pf_error_code { X86_PF_PROT = 1 << 0, X86_PF_WRITE = 1 << 1, X86_PF_USER = 1 << 2, X86_PF_RSVD = 1 << 3, X86_PF_INSTR = 1 << 4, X86_PF_PK = 1 << 5, X86_PF_SHSTK = 1 << 6, X86_PF_SGX = 1 << 15, };
Since you are executing a user-mode program, X86_PF_USER
is set and the
error code is at least 4. If the invalid memory access is a write,
then X86_PF_WRITE
is set instead. Thus:
- If the error code is 4, then the faulty memory access is a read from user space.
- If the error code is 6, then the faulty memory access is a write from user space.
Moreover, the faulty memory address in dmesg
can help you identify
the bug. For instance, if the memory address is 0, the root cause is
probably a NULL
pointer dereference.
The name of the VMA may indicate the location of the error:
#include <stdlib.h> int main(void) { free((void *) 42); return 0; }
When executed, the program above triggers a segfault and the VMA name
is libc
. Maybe a function from libc
was called with a pointer no
longer valid.
progname[1234]: segfault at 22 ip 00007f6b2531473c sp 00007ffc7b2c5c30 error 4 in libc-2.31.so[7f6b252af000+14b000]
The fault handler is architecture dependent, so you will not observe
the same messages in dmesg
with other architectures than x86. For
instance, on ARM no message is displayed unless the Linux kernel has
been built with CONFIG_DEBUG_USER
.