C++ 您如何阅读段错误内核日志消息

Question

提问by Sullenx

This can be a very simple question, I'm am attempting to debug an application which generates the following segfault error in the kern.log

这可能是一个非常简单的问题，我正在尝试调试一个应用程序，该应用程序在 kern.log

kernel: myapp[15514]: segfault at 794ef0 ip 080513b sp 794ef0 error 6 in myapp[8048000+24000]

Here are my questions:

以下是我的问题：

Is there any documentation as to what are the diff error numbers on segfault, in this instance it is error 6, but i've seen error 4, 5
What is the meaning of the information at bf794ef0 ip 0805130b sp bf794ef0 and myapp[8048000+24000]?

是否有任何关于段错误上的差异错误号的文档，在这种情况下是错误 6，但我已经看到错误 4、5
信息的含义是at bf794ef0 ip 0805130b sp bf794ef0 and myapp[8048000+24000]什么？

So far i was able to compile with symbols, and when i do a x 0x8048000+24000it returns a symbol, is that the correct way of doing it? My assumptions thus far are the following:

到目前为止，我能够使用符号进行编译，当我执行 a 时，x 0x8048000+24000它返回一个符号，这是正确的做法吗？到目前为止，我的假设如下：

sp = stack pointer?
ip = instruction pointer
at = ????
myapp[8048000+24000] = address of symbol?

sp = 堆栈指针？
ip = 指令指针
在 = ????
myapp[8048000+24000] = 符号地址？

Answer 1

回答by Charles Duffy

When the report points to a program, not a shared library

当报告指向一个程序，而不是一个共享库时

Run addr2line -e myapp 080513b(and repeat for the other instruction pointer values given) to see where the error is happening. Better, get a debug-instrumented build, and reproduce the problem under a debugger such as gdb.

运行addr2line -e myapp 080513b（并对给定的其他指令指针值重复）以查看错误发生的位置。更好的是，获得调试工具构建，并在调试器（如 gdb）下重现问题。

If it's a shared library

如果是共享库

In the libfoo.so[NNNNNN+YYYY]part, the NNNNNNis where the library was loaded. Subtract this from the instruction pointer (ip) and you'll get the offset into the .soof the offending instruction. Then you can use objdump -DCgl libfoo.soand search for the instruction at that offset. You should easily be able to figure out which function it is from the asm labels. If the .sodoesn't have optimizations you can also try using addr2line -e libfoo.so <offset>.

在libfoo.so[NNNNNN+YYYY]部分中，这NNNNNN是加载库的位置。从指令指针 ( ip) 中减去它，您将获得.so违规指令的偏移量。然后您可以使用objdump -DCgl libfoo.so和搜索该偏移量处的指令。您应该能够轻松地从 asm 标签中找出它是哪个函数。如果.so没有优化，您也可以尝试使用addr2line -e libfoo.so <offset>.

What the error means

错误意味着什么

Here's the breakdown of the fields:

以下是字段的细分：

address- the location in memory the code is trying to access (it's likely that 10and 11are offsets from a pointer we expect to be set to a valid value but which is instead pointing to 0)
ip- instruction pointer, ie. where the code which is trying to do this lives
sp- stack pointer
error- Architecture-specific flags; see arch/*/mm/fault.cfor your platform.

address-在内存中的代码试图访问的位置（很可能是10和11是从一个指针偏移我们希望被设置为一个有效的值，但它不是指向0）
ip- 指令指针，即。尝试执行此操作的代码所在的位置
sp- 堆栈指针
error- 特定于架构的标志；看看arch/*/mm/fault.c你的平台。

Answer 2

回答by jschmier

Based on my limited knowledge, your assumptions are correct.

根据我有限的知识，您的假设是正确的。

sp= stack pointer
ip= instruction pointer
myapp[8048000+24000]= address

sp= 堆栈指针
ip= 指令指针
myapp[8048000+24000]= 地址

If I were debugging the problem I would modify the code to produce a core dump or log a stack backtraceon the crash. You might also run the program under (or attach) GDB.

如果我正在调试问题，我会修改代码以生成核心转储或在崩溃时记录堆栈回溯。您还可以在（或附加）GDB 下运行该程序。

The error code is just the architectural error code for page faults and seems to be architecture specific. They are often documented in arch/*/mm/fault.cin the kernel source. My copy of Linux/arch/i386/mm/fault.chas the following definition for error_code:

错误代码只是页面错误的架构错误代码，似乎是特定于架构的。它们通常记录在arch/*/mm/fault.c内核源代码中。我的副本Linux/arch/i386/mm/fault.c对 error_code 有以下定义：

bit 0 == 0 means no page found, 1 means protection fault
bit 1 == 0 means read, 1 means write
bit 2 == 0 means kernel, 1 means user-mode

位 0 == 0 表示未找到页面，1 表示保护错误
位 1 == 0 表示读取，1 表示写入
位 2 == 0 表示内核，1 表示用户模式

My copy of Linux/arch/x86_64/mm/fault.cadds the following:

我的副本Linux/arch/x86_64/mm/fault.c添加了以下内容：

bit 3 == 1 means fault was an instruction fetch

位 3 == 1 表示错误是指令提取

Answer 3

回答by scripthelps

If it's a shared library
You're hosed, unfortunately; it's not possible to know where the libraries were placed in memory by the dynamic linker after-the-fact.

如果是共享库
不幸的是，你被灌输了；事后不可能知道动态链接器将库放置在内存中的哪个位置。

Well, there is still a possibility to retrieve the information, not from the binary, but from the object. But you need the base address of the object. And this information still is within the coredump, in the link_map structure.

嗯，仍然有可能检索信息，不是从二进制文件，而是从对象。但是您需要对象的基地址。而这个信息仍然在 coredump 中，在 link_map 结构中。

So first you want to import the struct link_map into GDB. So lets compile a program with it with debug symbol and add it to the GDB.

所以首先你想将 struct link_map 导入 GDB。因此，让我们使用调试符号编译一个程序并将其添加到 GDB。

link.c

链接文件

#include <link.h>
toto(){struct link_map * s = 0x400;}

get_baseaddr_from_coredump.sh

#!/bin/bash

BINARY=$(which myapplication)

IsBinPIE ()
{
    readelf -h |grep 'Type' |grep "EXEC">/dev/null || return 0
    return 1
}

Hex2Decimal ()
{
    export number="`echo "" | sed -e 's:^0[xX]::' | tr '[a-f]' '[A-F]'`"
    export number=`echo "ibase=16; $number" | bc`
}

GetBinaryLength ()
{
    if [ $# != 1 ]; then
    echo "Error, no argument provided"
    fi
    IsBinPIE  || (echo "ET_EXEC file, need a base_address"; exit 0)
    export totalsize=0
    # Get PT_LOAD's size segment out of Program Header Table (ELF format)
    export sizes="$(readelf -l  |grep LOAD |awk '{print }'|tr '\n' ' ')"
    for size in $sizes
    do Hex2Decimal "$size"; export totalsize=$(expr $number + $totalsize); export totalsize=$(expr $number + $totalsize)
    done
    return $totalsize
}

if [ $# = 1 ]; then
    echo "Using binary "
    IsBinPIE  && (echo "NOT ET_EXEC, need a base_address..."; exit 0)
    BINARY=
fi

gcc -g3 -fPIC -shared link.c -o link.so

GOTADDR=$(readelf -S $BINARY|grep -E '\.got.plt[ \t]'|awk '{print }')

echo "First do the following command :"
echo file $BINARY
echo add-symbol-file ./link.so 0x0
read
echo "Now copy/paste the following into your gdb session with attached coredump"
cat <<EOF
set $linkmapaddr = *(0x$GOTADDR + 4)
set $mylinkmap = (struct link_map *) $linkmapaddr
while ($mylinkmap != 0)
if ($mylinkmap->l_addr)
printf "add-symbol-file .%s %#.08x\n", $mylinkmap->l_name, $mylinkmap->l_addr
end
set $mylinkmap = $mylinkmap->l_next
end

it will print you the whole link_map content, within a set of GDB command.

它将在一组 GDB 命令中为您打印整个 link_map 内容。

It itself it might seems unnesseray but with the base_addr of the shared object we are about, you might get some more information out of an address by debuging directly the involved shared object in another GDB instance. Keep the first gdb to have an idee of the symbol.

它本身可能看起来不合理，但是使用我们所讨论的共享对象的 base_addr，您可以通过直接调试另一个 GDB 实例中涉及的共享对象来从地址中获得更多信息。保留第一个 gdb 以了解符号。

NOTE : the script is rather incomplete i suspect you may addto the second parameter of add-symbol-file printed the sum with this value :

注意：脚本相当不完整，我怀疑您可能会添加到 add-symbol-file 的第二个参数中，打印出具有此值的总和：

readelf -S $SO_PATH|grep -E '\.text[ \t]'|awk '{print }'

where $SO_PATH is the firstargument of the add-symbol-file

其中 $SO_PATH 是添加符号文件的第一个参数

Hope it helps

希望能帮助到你

C++ 您如何阅读段错误内核日志消息

提问by Sullenx

回答by Charles Duffy

When the report points to a program, not a shared library

当报告指向一个程序，而不是一个共享库时

If it's a shared library

如果是共享库

What the error means

错误意味着什么

回答by jschmier

回答by scripthelps

相关推荐

最近更新

标签

C++ 您如何阅读段错误内核日志消息

提问by Sullenx

回答by Charles Duffy

When the report points to a program, not a shared library

当报告指向一个程序，而不是一个共享库时

If it's a shared library

如果是共享库

What the error means

错误意味着什么

回答by jschmier

回答by scripthelps

相关推荐

C++ “表达式必须具有类类型”错误是什么意思？

是否还需要在源文件中添加“extern C”？

赋值运算符与复制构造函数 C++

C++ PThread 与 boost::thread？

相关推荐

最近更新

标签