Decode segfault errors in dmesg
You are writing a C program. You build it. You run it.
$ ./foo Segmentation fault
The machine hardly reminds you that you are human. But before rushing
to re-compile your program with debugging symbols or adding printf()
calls here and there, have a look at the output of the Linux kernel:
$ dmesg foo[1234]: segfault at 2a ip 0000000000400511 sp 00007fffe00a3260 error 4 in foo[400000+1000]
There are some hints in the output of dmesg:
-
foois the executable name -
1234is the process ID -
2ais the faulty address in hexadecimal - the value after
ipis the instruction pointer - the value after
spis the stack pointer -
error 4is an error code - the string at the end is the name of the virtual memory area (VMA)
The error code is a combination of several error bits defined in fault.c in the Linux kernel:
/* * Page fault error code bits: * * bit 0 == 0: no page found 1: protection fault * bit 1 == 0: read access 1: write access * bit 2 == 0: kernel-mode access 1: user-mode access * bit 3 == 1: use of reserved bit detected * bit 4 == 1: fault was an instruction fetch * bit 5 == 1: protection keys block access * bit 6 == 1: shadow stack access fault * bit 15 = 1: SGX MMU page-fault */ enum x86_pf_error_code { X86_PF_PROT = 1 << 0, X86_PF_WRITE = 1 << 1, X86_PF_USER = 1 << 2, X86_PF_RSVD = 1 << 3, X86_PF_INSTR = 1 << 4, X86_PF_PK = 1 << 5, X86_PF_SHSTK = 1 << 6, X86_PF_SGX = 1 << 15, };
Since you are executing a user-mode program, X86_PF_USER is set and the
error code is at least 4. If the invalid memory access is a write,
then X86_PF_WRITE is set instead. Thus:
- If the error code is 4, then the faulty memory access is a read from user space.
- If the error code is 6, then the faulty memory access is a write from user space.
Moreover, the faulty memory address in dmesg can help you identify
the bug. For instance, if the memory address is 0, the root cause is
probably a NULL pointer dereference.
The name of the VMA may indicate the location of the error:
#include <stdlib.h> int main(void) { free((void *) 42); return 0; }
When executed, the program above triggers a segfault and the VMA name
is libc. Maybe a function from libc was called with a pointer no
longer valid.
progname[1234]: segfault at 22 ip 00007f6b2531473c sp 00007ffc7b2c5c30 error 4 in libc-2.31.so[7f6b252af000+14b000]
The fault handler is architecture dependent, so you will not observe
the same messages in dmesg with other architectures than x86. For
instance, on ARM no message is displayed unless the Linux kernel has
been built with CONFIG_DEBUG_USER.
A 64-bit 64-beam architecture
Fondation Louis Vuitton