multithreading 如何使用 GDB 分析故障转储文件

Question

提问by red.clover

I have a server application running under Cent OS. The server answers many requests per second but it repeatedly crashes after each hour or so and creates a crash dump file. The situation is really bad and I need to find out the crash cause as soon as possible.

我有一个在 Cent OS 下运行的服务器应用程序。服务器每秒响应许多请求，但每隔一小时左右就会反复崩溃并创建崩溃转储文件。情况真的很糟糕，我需要尽快找出崩溃原因。

I suspect that the problem is a concurrency problem but I'm not sure. I have access to the source code and crash dump files but I don't know how to use the crash dumps to pin point the problem.

我怀疑问题是并发问题，但我不确定。我可以访问源代码和故障转储文件，但我不知道如何使用故障转储来确定问题所在。

Any suggestions are much appreciated.

任何建议都非常感谢。

Answer 1

回答by Jonathan Leffler

If the problem takes an hour or so to manifest itself, it might be a memory problem - perhaps running out, or perhaps trampling (using already released memory, for example).

如果问题需要一个小时左右才能显现出来，那么它可能是内存问题 - 可能用完了，或者可能是踩踏（例如，使用已经释放的内存）。

You say you've got the crash dump files - that is a core dump?

你说你有崩溃转储文件 - 那是一个核心转储？

Assuming you have a core dump, then the first step should probably be to print the stack backtrace:

假设你有一个核心转储，那么第一步应该是打印堆栈回溯：

gdb program core
> where

This should tell you where the program was when the crash occurred. What else is available depends on how the server was compiled. If possible, you should recompile with debugging enabled (that would be with the '-g' flag with GCC). This would give you more information from the stack backtrace.

这应该会告诉您崩溃发生时程序的位置。还有什么可用取决于服务器的编译方式。如果可能，您应该在启用调试的情况下重新编译（即-g使用 GCC的 ' ' 标志）。这将为您提供来自堆栈回溯的更多信息。

If your problem is memory related, consider running with valgrind.

如果您的问题与内存有关，请考虑使用valgrind.

Also consider building and running with a debugging version of malloc(). A debugging version will detect memory abuses that normal versions miss - or crash on.

还可以考虑使用malloc(). 调试版本将检测正常版本错过或崩溃的内存滥用。

Answer 2

回答by alex tingle

The first thing to look for is the error message that you get when the program crashes. This will often tell you what kind of error occurred. For example "segmentation fault"or "SIGSEGV"almost certainly mean that your program has de-referenced a NULL or otherwise invalid pointer. If the program is written in C++, then the error message will often tell you the name of any uncaught exception.

首先要查找的是程序崩溃时收到的错误消息。这通常会告诉您发生了什么样的错误。例如，“分段错误”或“SIGSEGV”几乎肯定意味着您的程序取消了对 NULL 或其他无效指针的引用。如果程序是用 C++ 编写的，那么错误消息通常会告诉您任何未捕获的异常的名称。

If you aren't seeing the error message, then run the program from the command line, or pipe its output into a file.

如果您没有看到错误消息，请从命令行运行程序，或将其输出通过管道传输到文件中。

In order for a core file to be really useful, you need to compile your program without optimisation and with debugging information. GCC needs the following options: -g -O0. (Make sure your build doesn't have any other -Ooptions.)

为了使核心文件真正有用，您需要在没有优化和调试信息的情况下编译您的程序。GCC 需要以下选项：-g -O0. （确保您的构建没有任何其他-O选项。）

Once you have the core file, then open it in gdb with:

一旦你有了核心文件，然后在 gdb 中打开它：

gdb YOUR-APP COREFILE

Type whereto see the point where the crash occurred. You are basically in a normal debugging session - you can examine variables, move up and down the stack, switch between threads and whatever.

键入where以查看发生崩溃的点。您基本上处于正常的调试会话中 - 您可以检查变量、在堆栈中上下移动、在线程之间切换等等。

If your program has crashed, then it's probably an invalid memory access - so you need to look for a pointer that has zero-value, or that points to bad looking data. You might not find the problem at the very bottom of the stack, you might have to move up the stack a few levels before you find the problem.

如果您的程序崩溃了，那么它可能是无效的内存访问 - 所以您需要寻找一个具有零值的指针，或者指向看起来很糟糕的数据。您可能不会在堆栈的最底部发现问题，您可能需要在堆栈中向上移动几层才能找到问题。

Good luck!

祝你好运！

Answer 3

回答by dicroce

gdb -c core.file exename
bt

Assuming it exenamewas built with debugging symbols (and all of it's dynamic dependencies are in the path) that will get you a back trace. 'up' and 'down' will move you up and down in the stack, and p varnamecan be used to examine locals and parameters.

假设它exename是用调试符号构建的（并且它的所有动态依赖项都在路径中），这将为您提供回溯。'up' 和 'down' 将使您在堆栈中上下移动，p varname并可用于检查局部变量和参数。

You could also try running it under valgrind:

您也可以尝试在 valgrind 下运行它：

valgrind --tool=memcheck --leak-check=full exename

Answer 4

回答by David B

Does your app create a core file? If so, I would use gdb to debug this problem.

您的应用程序是否创建了核心文件？如果是这样，我会使用 gdb 来调试这个问题。

multithreading 如何使用 GDB 分析故障转储文件

提问by red.clover

回答by Jonathan Leffler

回答by alex tingle

回答by dicroce

回答by David B

相关推荐

最近更新

标签

multithreading 如何使用 GDB 分析故障转储文件

提问by red.clover

回答by Jonathan Leffler

回答by alex tingle

回答by dicroce

回答by David B

相关推荐

multithreading 什么是互斥锁？

multithreading 什么时候多线程不是一个好主意？

multithreading 获取当前正在执行的线程的 TThread 对象？

multithreading gcc 中的线程安全原子操作

相关推荐

最近更新

标签