<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="../assets/xml/rss.xsl" media="all"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Enodev.fr / Christophe's log (Posts about Debugging)</title><link>https://www.enodev.fr/</link><description></description><atom:link href="https://www.enodev.fr/categories/debugging.xml" rel="self" type="application/rss+xml"></atom:link><language>en</language><lastBuildDate>Sun, 10 Mar 2024 11:03:11 GMT</lastBuildDate><generator>Nikola (getnikola.com)</generator><docs>http://blogs.law.harvard.edu/tech/rss</docs><item><title>Decode segfault errors in dmesg</title><link>https://www.enodev.fr/posts/decode-segfault-errors-in-dmesg.html</link><dc:creator>Christophe Vu-Brugier</dc:creator><description>&lt;p&gt;You are writing a C program. You build it. You run it.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="gp"&gt;$ &lt;/span&gt;./foo
&lt;span class="go"&gt;Segmentation fault&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The machine hardly reminds you that you are human. But before rushing
to re-compile your program with debugging symbols or adding &lt;code&gt;printf()&lt;/code&gt;
calls here and there, have a look at the output of the Linux kernel:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="gp"&gt;$ &lt;/span&gt;dmesg
&lt;span class="go"&gt;foo[1234]: segfault at 2a ip 0000000000400511 sp 00007fffe00a3260 error 4 in foo[400000+1000]&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;There are some hints in the output of &lt;code&gt;dmesg&lt;/code&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;foo&lt;/code&gt; is the &lt;em&gt;executable name&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;1234&lt;/code&gt; is the &lt;em&gt;process ID&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;2a&lt;/code&gt; is the &lt;em&gt;faulty address&lt;/em&gt; in hexadecimal&lt;/li&gt;
&lt;li&gt;the value after &lt;code&gt;ip&lt;/code&gt; is the &lt;em&gt;instruction pointer&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;the value after &lt;code&gt;sp&lt;/code&gt; is the &lt;em&gt;stack pointer&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;error 4&lt;/code&gt; is an &lt;em&gt;error code&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;the string at the end is the &lt;em&gt;name of the virtual memory area&lt;/em&gt; (&lt;abbr title="virtual memory area"&gt;VMA&lt;/abbr&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The error code is a combination of several error bits defined in
&lt;a href="https://elixir.bootlin.com/linux/v6.7/source/arch/x86/include/asm/trap_pf.h#L6"&gt;fault.c&lt;/a&gt;
in the Linux kernel:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="cm"&gt;/*&lt;/span&gt;
&lt;span class="cm"&gt; * Page fault error code bits:&lt;/span&gt;
&lt;span class="cm"&gt; *&lt;/span&gt;
&lt;span class="cm"&gt; *   bit 0 ==    0: no page found       1: protection fault&lt;/span&gt;
&lt;span class="cm"&gt; *   bit 1 ==    0: read access         1: write access&lt;/span&gt;
&lt;span class="cm"&gt; *   bit 2 ==    0: kernel-mode access  1: user-mode access&lt;/span&gt;
&lt;span class="cm"&gt; *   bit 3 ==                           1: use of reserved bit detected&lt;/span&gt;
&lt;span class="cm"&gt; *   bit 4 ==                           1: fault was an instruction fetch&lt;/span&gt;
&lt;span class="cm"&gt; *   bit 5 ==                           1: protection keys block access&lt;/span&gt;
&lt;span class="cm"&gt; *   bit 6 ==                           1: shadow stack access fault&lt;/span&gt;
&lt;span class="cm"&gt; *   bit 15 =                           1: SGX MMU page-fault&lt;/span&gt;
&lt;span class="cm"&gt; */&lt;/span&gt;
&lt;span class="k"&gt;enum&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;x86_pf_error_code&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;X86_PF_PROT&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt;               &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;X86_PF_WRITE&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt;               &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;X86_PF_USER&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt;               &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;X86_PF_RSVD&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt;               &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;X86_PF_INSTR&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt;               &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;X86_PF_PK&lt;/span&gt;&lt;span class="w"&gt;       &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt;               &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;X86_PF_SHSTK&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt;               &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;X86_PF_SGX&lt;/span&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt;               &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Since you are executing a user-mode program, &lt;code&gt;X86_PF_USER&lt;/code&gt; is set and the
error code is at least 4. If the invalid memory access is a write,
then &lt;code&gt;X86_PF_WRITE&lt;/code&gt; is set instead. Thus:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;If the error code is 4, then the faulty memory access is a &lt;em&gt;read&lt;/em&gt;
  from user space.&lt;/li&gt;
&lt;li&gt;If the error code is 6, then the faulty memory access is a &lt;em&gt;write&lt;/em&gt;
  from user space.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Moreover, the &lt;em&gt;faulty memory address&lt;/em&gt; in &lt;code&gt;dmesg&lt;/code&gt; can help you identify
the bug. For instance, if the memory address is 0, the root cause is
probably a &lt;code&gt;NULL&lt;/code&gt; pointer dereference.&lt;/p&gt;
&lt;p&gt;The name of the VMA may indicate the location of the error:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="cp"&gt;#include&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cpf"&gt;&amp;lt;stdlib.h&amp;gt;&lt;/span&gt;

&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;free&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;When executed, the program above triggers a segfault and the VMA name
is &lt;code&gt;libc&lt;/code&gt;. Maybe a function from &lt;code&gt;libc&lt;/code&gt; was called with a pointer no
longer valid.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="go"&gt;progname[1234]: segfault at 22 ip 00007f6b2531473c sp 00007ffc7b2c5c30 error 4 in libc-2.31.so[7f6b252af000+14b000]&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The fault handler is architecture dependent, so you will not observe
the same messages in &lt;code&gt;dmesg&lt;/code&gt; with other architectures than x86. For
instance, on ARM no message is displayed unless the Linux kernel has
been built with &lt;code&gt;CONFIG_DEBUG_USER&lt;/code&gt;.&lt;/p&gt;
&lt;div class="caption"&gt;
  &lt;img src="https://www.enodev.fr/images/poutres-fondation-louis-vuitton.thumbnail.jpg" class="img-fluid rounded" alt="Fondation Louis Vuitton"&gt;
  &lt;p&gt;A &lt;s&gt;64-bit&lt;/s&gt; 64-beam architecture&lt;br&gt;
  &lt;small&gt;Fondation Louis Vuitton&lt;/small&gt;&lt;/p&gt;
&lt;/div&gt;</description><category>Debugging</category><category>Linux</category><category>Programming</category><category>x86</category><guid>https://www.enodev.fr/posts/decode-segfault-errors-in-dmesg.html</guid><pubDate>Sat, 14 Jun 2014 18:00:00 GMT</pubDate></item></channel></rss>