Linux 解释段错误消息

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2549214/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-03 19:54:27  来源:igfitidea点击:

Interpreting segfault messages

linuxqtwebkitkernelsegmentation-fault

提问by knorv

What is the correct interpretation of the following segfault messages?

以下段错误消息的正确解释是什么?

segfault at 10 ip 00007f9bebcca90d sp 00007fffb62705f0 error 4 in libQtWebKit.so.4.5.2[7f9beb83a000+f6f000]
segfault at 10 ip 00007fa44d78890d sp 00007fff43f6b720 error 4 in libQtWebKit.so.4.5.2[7fa44d2f8000+f6f000]
segfault at 11 ip 00007f2b0022acee sp 00007fff368ea610 error 4 in libQtWebKit.so.4.5.2[7f2aff9f7000+f6f000]
segfault at 11 ip 00007f24b21adcee sp 00007fff7379ded0 error 4 in libQtWebKit.so.4.5.2[7f24b197a000+f6f000]

采纳答案by Charles Duffy

This is a segfault due to following a null pointer trying to find code to run (that is, during an instruction fetch).

这是由于跟随空指针试图查找要运行的代码(即在指令提取期间)而导致的段错误。

If this were a program, not a shared library

如果这是一个程序,而不是共享库

Run addr2line -e yourSegfaultingProgram 00007f9bebcca90d(and repeat for the other instruction pointer values given) to see where the error is happening. Better, get a debug-instrumented build, and reproduce the problem under a debugger such as gdb.

运行addr2line -e yourSegfaultingProgram 00007f9bebcca90d(并对给定的其他指令指针值重复)以查看错误发生的位置。更好的是,获得调试工具构建,并在调试器(如 gdb)下重现问题。

Since it's a shared library

因为它是一个共享库

You're hosed, unfortunately; it's not possible to know where the libraries were placed in memory by the dynamic linker after-the-fact. Reproduce the problem under gdb.

不幸的是,你被灌输了;事后不可能知道动态链接器将库放置在内存中的哪个位置。下重现问题gdb

What the error means

错误意味着什么

Here's the breakdown of the fields:

以下是字段的细分:

  • address(after the at) - the location in memory the code is trying to access (it's likely that 10and 11are offsets from a pointer we expect to be set to a valid value but which is instead pointing to 0)
  • ip- instruction pointer, ie. where the code which is trying to do this lives
  • sp- stack pointer
  • error- An error code for page faults; see below for what this means on x86.

    /*
     * Page fault error code bits:
     *
     *   bit 0 ==    0: no page found       1: protection fault
     *   bit 1 ==    0: read access         1: write access
     *   bit 2 ==    0: kernel-mode access  1: user-mode access
     *   bit 3 ==                           1: use of reserved bit detected
     *   bit 4 ==                           1: fault was an instruction fetch
     */
    
  • address(在at) - 代码试图访问的内存中的位置(很可能是10并且11是我们希望设置为有效值但指向的指针的偏移量0
  • ip- 指令指针,即。尝试执行此操作的代码所在的位置
  • sp- 堆栈指针
  • error- 页面错误的错误代码;请参阅下文,了解这在 x86 上的含义。

    /*
     * Page fault error code bits:
     *
     *   bit 0 ==    0: no page found       1: protection fault
     *   bit 1 ==    0: read access         1: write access
     *   bit 2 ==    0: kernel-mode access  1: user-mode access
     *   bit 3 ==                           1: use of reserved bit detected
     *   bit 4 ==                           1: fault was an instruction fetch
     */
    

回答by sendmoreinfo

Let's go to the source -- 2.6.32, for example. The message is printed by show_signal_msg() function in arch/x86/mm/fault.c if the show_unhandled_signals sysctl is set.

让我们转到源 -例如 2.6.32。如果设置了 show_unhandled_signals sysctl,则消息由 arch/x86/mm/fault.c 中的 show_signal_msg() 函数打印。

"error" is not an errno nor a signal number, it's a "page fault error code" -- see definition of enum x86_pf_error_code.

“错误”既不是 errno 也不是信号号,它是一个“页面错误错误代码”——参见 enum x86_pf_error_code 的定义。

"[7fa44d2f8000+f6f000]" is starting address and size of virtual memory area where offending object was mapped at the time of crash. Value of "ip" should fit in this region. With this info in hand, it should be easy to find offending code in gdb.

“[7fa44d2f8000+f6f000]”是崩溃时映射违规对象的虚拟内存区域的起始地址和大小。“ip”的值应该适合这个区域。掌握了这些信息,应该很容易在 gdb 中找到有问题的代码。

回答by Tim

Error 4 means "The cause was a user-mode read resulting in no page being found.". There's a tool that decodes it here.

错误 4 表示“原因是用户模式读取导致找不到页面。”。有一个工具可以解码它here

Here's the definition from the kernel. Keep in mind that 4 means that bit 2 is set and no other bits are set. If you convert it to binary that becomes clear.

这是内核的定义。请记住,4 表示设置了第 2 位并且没有设置其他位。如果你把它转换成二进制就清楚了。

/*
 * Page fault error code bits
 *      bit 0 == 0 means no page found, 1 means protection fault
 *      bit 1 == 0 means read, 1 means write
 *      bit 2 == 0 means kernel, 1 means user-mode
 *      bit 3 == 1 means use of reserved bit detected
 *      bit 4 == 1 means fault was an instruction fetch
 */
#define PF_PROT         (1<<0)
#define PF_WRITE        (1<<1)
#define PF_USER         (1<<2)
#define PF_RSVD         (1<<3)
#define PF_INSTR        (1<<4)

Now then, "ip 00007f9bebcca90d" means the instruction pointer was at 0x00007f9bebcca90d when the segfault happened.

现在,“ip 00007f9bebcca90d”表示发生段错误时指令指针位于 0x00007f9bebcca90d。

"libQtWebKit.so.4.5.2[7f9beb83a000+f6f000]" tells you:

“libQtWebKit.so.4.5.2[7f9beb83a000+f6f000]”告诉你:

  • The object the crash was in: "libQtWebKit.so.4.5.2"
  • The base address of that object "7f9beb83a000"
  • How big that object is: "f6f000"
  • 崩溃所在的对象:“libQtWebKit.so.4.5.2”
  • 该对象的基地址“7f9beb83a000”
  • 该对象有多大:“f6f000”

If you take the base address and subtract it from the ip, you get the offset into that object:

如果获取基地址并从 ip 中减去它,则会得到该对象的偏移量:

0x00007f9bebcca90d - 0x7f9beb83a000 = 0x49090D

Then you can run addr2line on it:

然后你可以在它上面运行 addr2line :

addr2line -e /usr/lib64/qt45/lib/libQtWebKit.so.4.5.2 -fCi 0x49090D
??
??:0

In my case it wasn't successful, either the copy I installed isn't identical to yours, or it's stripped.

在我的情况下,它没有成功,要么我安装的副本与你的不一样,要么它被剥离了。