C++ 您如何阅读段错误内核日志消息
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2179403/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How do you read a segfault kernel log message
提问by Sullenx
This can be a very simple question, I'm am attempting to debug an application which generates the following segfault error in the kern.log
这可能是一个非常简单的问题,我正在尝试调试一个应用程序,该应用程序在 kern.log
kernel: myapp[15514]: segfault at 794ef0 ip 080513b sp 794ef0 error 6 in myapp[8048000+24000]
kernel: myapp[15514]: segfault at 794ef0 ip 080513b sp 794ef0 error 6 in myapp[8048000+24000]
Here are my questions:
以下是我的问题:
Is there any documentation as to what are the diff error numbers on segfault, in this instance it is error 6, but i've seen error 4, 5
What is the meaning of the information
at bf794ef0 ip 0805130b sp bf794ef0 and myapp[8048000+24000]
?
是否有任何关于段错误上的差异错误号的文档,在这种情况下是错误 6,但我已经看到错误 4、5
信息的含义是
at bf794ef0 ip 0805130b sp bf794ef0 and myapp[8048000+24000]
什么?
So far i was able to compile with symbols, and when i do a x 0x8048000+24000
it returns a symbol, is that the correct way of doing it? My assumptions thus far are the following:
到目前为止,我能够使用符号进行编译,当我执行 a 时,x 0x8048000+24000
它返回一个符号,这是正确的做法吗?到目前为止,我的假设如下:
- sp = stack pointer?
- ip = instruction pointer
- at = ????
- myapp[8048000+24000] = address of symbol?
- sp = 堆栈指针?
- ip = 指令指针
- 在 = ????
- myapp[8048000+24000] = 符号地址?
回答by Charles Duffy
When the report points to a program, not a shared library
当报告指向一个程序,而不是一个共享库时
Run addr2line -e myapp 080513b
(and repeat for the other instruction pointer values given) to see where the error is happening. Better, get a debug-instrumented build, and reproduce the problem under a debugger such as gdb.
运行addr2line -e myapp 080513b
(并对给定的其他指令指针值重复)以查看错误发生的位置。更好的是,获得调试工具构建,并在调试器(如 gdb)下重现问题。
If it's a shared library
如果是共享库
In the libfoo.so[NNNNNN+YYYY]
part, the NNNNNN
is where the library was loaded. Subtract this from the instruction pointer (ip
) and you'll get the offset into the .so
of the offending instruction. Then you can use objdump -DCgl libfoo.so
and search for the instruction at that offset. You should easily be able to figure out which function it is from the asm labels. If the .so
doesn't have optimizations you can also try using addr2line -e libfoo.so <offset>
.
在libfoo.so[NNNNNN+YYYY]
部分中,这NNNNNN
是加载库的位置。从指令指针 ( ip
) 中减去它,您将获得.so
违规指令的偏移量。然后您可以使用objdump -DCgl libfoo.so
和搜索该偏移量处的指令。您应该能够轻松地从 asm 标签中找出它是哪个函数。如果.so
没有优化,您也可以尝试使用addr2line -e libfoo.so <offset>
.
What the error means
错误意味着什么
Here's the breakdown of the fields:
以下是字段的细分:
address
- the location in memory the code is trying to access (it's likely that10
and11
are offsets from a pointer we expect to be set to a valid value but which is instead pointing to0
)ip
- instruction pointer, ie. where the code which is trying to do this livessp
- stack pointererror
- Architecture-specific flags; seearch/*/mm/fault.c
for your platform.
address
-在内存中的代码试图访问的位置(很可能是10
和11
是从一个指针偏移我们希望被设置为一个有效的值,但它不是指向0
)ip
- 指令指针,即。尝试执行此操作的代码所在的位置sp
- 堆栈指针error
- 特定于架构的标志;看看arch/*/mm/fault.c
你的平台。
回答by jschmier
Based on my limited knowledge, your assumptions are correct.
根据我有限的知识,您的假设是正确的。
sp
= stack pointerip
= instruction pointermyapp[8048000+24000]
= address
sp
= 堆栈指针ip
= 指令指针myapp[8048000+24000]
= 地址
If I were debugging the problem I would modify the code to produce a core dump or log a stack backtraceon the crash. You might also run the program under (or attach) GDB.
如果我正在调试问题,我会修改代码以生成核心转储或在崩溃时记录堆栈回溯。您还可以在(或附加)GDB 下运行该程序。
The error code is just the architectural error code for page faults and seems to be architecture specific. They are often documented in arch/*/mm/fault.c
in the kernel source. My copy of Linux/arch/i386/mm/fault.c
has the following definition for error_code:
错误代码只是页面错误的架构错误代码,似乎是特定于架构的。它们通常记录在arch/*/mm/fault.c
内核源代码中。我的副本Linux/arch/i386/mm/fault.c
对 error_code 有以下定义:
- bit 0 == 0 means no page found, 1 means protection fault
- bit 1 == 0 means read, 1 means write
- bit 2 == 0 means kernel, 1 means user-mode
- 位 0 == 0 表示未找到页面,1 表示保护错误
- 位 1 == 0 表示读取,1 表示写入
- 位 2 == 0 表示内核,1 表示用户模式
My copy of Linux/arch/x86_64/mm/fault.c
adds the following:
我的副本Linux/arch/x86_64/mm/fault.c
添加了以下内容:
- bit 3 == 1 means fault was an instruction fetch
- 位 3 == 1 表示错误是指令提取
回答by scripthelps
If it's a shared library
You're hosed, unfortunately; it's not possible to know where the libraries were placed in memory by the dynamic linker after-the-fact.
如果是共享库
不幸的是,你被灌输了;事后不可能知道动态链接器将库放置在内存中的哪个位置。
Well, there is still a possibility to retrieve the information, not from the binary, but from the object. But you need the base address of the object. And this information still is within the coredump, in the link_map structure.
嗯,仍然有可能检索信息,不是从二进制文件,而是从对象。但是您需要对象的基地址。而这个信息仍然在 coredump 中,在 link_map 结构中。
So first you want to import the struct link_map into GDB. So lets compile a program with it with debug symbol and add it to the GDB.
所以首先你想将 struct link_map 导入 GDB。因此,让我们使用调试符号编译一个程序并将其添加到 GDB。
link.c
链接文件
#include <link.h>
toto(){struct link_map * s = 0x400;}
get_baseaddr_from_coredump.sh
get_baseaddr_from_coredump.sh
#!/bin/bash
BINARY=$(which myapplication)
IsBinPIE ()
{
readelf -h |grep 'Type' |grep "EXEC">/dev/null || return 0
return 1
}
Hex2Decimal ()
{
export number="`echo "" | sed -e 's:^0[xX]::' | tr '[a-f]' '[A-F]'`"
export number=`echo "ibase=16; $number" | bc`
}
GetBinaryLength ()
{
if [ $# != 1 ]; then
echo "Error, no argument provided"
fi
IsBinPIE || (echo "ET_EXEC file, need a base_address"; exit 0)
export totalsize=0
# Get PT_LOAD's size segment out of Program Header Table (ELF format)
export sizes="$(readelf -l |grep LOAD |awk '{print }'|tr '\n' ' ')"
for size in $sizes
do Hex2Decimal "$size"; export totalsize=$(expr $number + $totalsize); export totalsize=$(expr $number + $totalsize)
done
return $totalsize
}
if [ $# = 1 ]; then
echo "Using binary "
IsBinPIE && (echo "NOT ET_EXEC, need a base_address..."; exit 0)
BINARY=
fi
gcc -g3 -fPIC -shared link.c -o link.so
GOTADDR=$(readelf -S $BINARY|grep -E '\.got.plt[ \t]'|awk '{print }')
echo "First do the following command :"
echo file $BINARY
echo add-symbol-file ./link.so 0x0
read
echo "Now copy/paste the following into your gdb session with attached coredump"
cat <<EOF
set $linkmapaddr = *(0x$GOTADDR + 4)
set $mylinkmap = (struct link_map *) $linkmapaddr
while ($mylinkmap != 0)
if ($mylinkmap->l_addr)
printf "add-symbol-file .%s %#.08x\n", $mylinkmap->l_name, $mylinkmap->l_addr
end
set $mylinkmap = $mylinkmap->l_next
end
it will print you the whole link_map content, within a set of GDB command.
它将在一组 GDB 命令中为您打印整个 link_map 内容。
It itself it might seems unnesseray but with the base_addr of the shared object we are about, you might get some more information out of an address by debuging directly the involved shared object in another GDB instance. Keep the first gdb to have an idee of the symbol.
它本身可能看起来不合理,但是使用我们所讨论的共享对象的 base_addr,您可以通过直接调试另一个 GDB 实例中涉及的共享对象来从地址中获得更多信息。保留第一个 gdb 以了解符号。
NOTE : the script is rather incomplete i suspect you may addto the second parameter of add-symbol-file printed the sum with this value :
注意:脚本相当不完整,我怀疑您可能会添加到 add-symbol-file 的第二个参数中,打印出具有此值的总和:
readelf -S $SO_PATH|grep -E '\.text[ \t]'|awk '{print }'
where $SO_PATH is the firstargument of the add-symbol-file
其中 $SO_PATH 是添加符号文件的第一个参数
Hope it helps
希望能帮助到你