java 在运行我的代码时如何调试 JVM 中发生的段错误?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/7250631/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-30 19:16:56  来源:igfitidea点击:

How do I debug Segfaults occurring in the JVM when it runs my code?

javasegmentation-fault

提问by Hanno Fietz

My Java application has started to crash regularly with a SIGSEGV and a dump of stack data and a load of information in a text file.

我的 Java 应用程序开始定期崩溃,出现 SIGSEGV 和堆栈数据转储以及文本文件中的大量信息。

I have debugged C programs in gdb and I have debugged Java code from my IDE. I'm not sure how to approach C-like crashes in a running Java program.

我已经在 gdb 中调试了 C 程序,并且在我的 IDE 中调试了 Java 代码。我不确定如何在运行的 Java 程序中处理类似 C 的崩溃。

I'm assuming I'm not looking at a JVM bug here. Other Java programs run just fine, and the JVM from Sun is probably more stable than my code. However, I have no idea how I could even cause segfaults with Java code. There definitely is enough memory available, and when I last checked in the profiler, heap usage was around 50% with occasional spikes around 80%. Are there any startup parameters I could investigate? What is a good checklist when approaching a bug like this?

我假设我没有在这里查看 JVM 错误。其他 Java 程序运行良好,Sun 的 JVM 可能比我的代码更稳定。但是,我不知道我是如何导致 Java 代码出现段错误的。肯定有足够的可用内存,当我上次检查分析器时,堆使用率约为 50%,偶尔峰值约为 80%。是否有任何我可以调查的启动参数?在处理这样的错误时,什么是好的清单?

Though I'm not so far able to reliably reproduce the event, it does not seem to occur entirely at random either, so testing is not completely impossible.

虽然到目前为止我还不能可靠地重现该事件,但它似乎也不是完全随机发生的,因此测试并非完全不可能。

ETA: Some of the gory details

ETA:一些血腥细节

(I'm looking for a general approach, since the actual problem might be very specific. Still, there's some info I already collected and that may be of some value.)

(我正在寻找一种通用方法,因为实际问题可能非常具体。不过,我已经收集了一些信息,这些信息可能具有一定的价值。)

A while ago, I had similar-looking trouble after upgrading my CI server (see herefor more details), but that fix (setting -XX:MaxPermSize) did not help this time.

不久前,我在升级 CI 服务器后遇到了类似的问题(请参阅此处了解更多详细信息),但这次修复(设置-XX:MaxPermSize)并没有帮助。

Further investigation revealed that in the crash log files the thread marked as "current thread" is never one of mine, but either one called "VMThread" or one called "GCTaskThread"- I f it's the latter, it is additionally marked with the comment "(exited)", if it's the former, the GCTaskThread is not in the list. This makes me suppose that the problem might be around the end of a GC operation.

进一步调查显示,在崩溃日志文件中,标记为“当前线程”的线程从来都不是我的线程,而是称为“VMThread”或称为“GCTaskThread”的线程-如果是后者,则另外标记有注释"(exited)",如果是前者,则GCTaskThread不在列表中。这让我认为问题可能出在 GC 操作结束时。

回答by jhericks

I'm assuming I'm not looking at a JVM bug here. Other Java programs run just fine, and the JVM from Sun is probably more stable than my code.

我假设我没有在这里查看 JVM 错误。其他 Java 程序运行良好,Sun 的 JVM 可能比我的代码更稳定。

I don't think you should make that assumption. Without using JNI, you should not be able to write Java code that causes a SIGSEGV (although we know it happens). My point is, when it happens, it is either a bug in the JVM (not unheard of) or a bug in some JNI code. If you don't have any JNI in your own code, that doesn't mean that you aren't using some library that is, so look for that. When I have seen this kind of problem before, it was in an image manipulation library. If the culprit isn't in your own JNI code, you probably won't be able to 'fix' the bug, but you may still be able to work around it.

我认为你不应该做出这样的假设。如果不使用JNI,您将无法编写导致 SIGSEGV 的 Java 代码(尽管我们知道它会发生)。我的观点是,当它发生时,要么是 JVM 中的错误(并非闻所未闻),要么是某些 JNI 代码中的错误。如果您自己的代码中没有任何 JNI,这并不意味着您没有使用某个库,因此请寻找它。之前看到这种问题的时候,是在一个图像处理库中。如果罪魁祸首不在您自己的 JNI 代码中,您可能无法“修复”该错误,但您仍然可以解决它。

First, you should get an alternate JVM on the same platform and try to reproduce it. You can try one of these alternatives.

首先,您应该在同一平台上获得一个备用 JVM 并尝试重现它。您可以尝试其中一种替代方法

If you cannot reproduce it, it likely is a JVM bug. From that, you can either mandate a particular JVM or search the bug database, using what you know about how to reproduce it, and maybe get suggested workarounds. (Even if you can reproduce it, many JVM implementations are just tweaks on Oracle's Hotspot implementation, so it might still be a JVM bug.)

如果您无法重现它,则可能是 JVM 错误。从中,您可以使用特定的 JVM 或搜索错误数据库,使用您对如何重现它的了解,并可能获得建议的解决方法。(即使你可以重现它,许多 JVM 实现只是对 Oracle 的 Hotspot 实现的调整,所以它可能仍然是一个 JVM 错误。)

If you can reproduce it with an alternative JVM, the fault mightbe that you have some JNI bug. Look at what libraries you are using and what native calls they might be making. Sometimes there are alternative "pure Java" configurations or jar files for the same library or alternative libraries that do almost the same thing.

如果您可以使用替代的 JVM 重现它,那么错误可能是您有一些 JNI 错误。查看您使用的库以及它们可能进行的本机调用。有时,同一个库或替代库有替代的“纯 Java”配置或 jar 文件,它们几乎可以做同样的事情。

Good luck!

祝你好运!

回答by bmargulies

The following will almost certainly be useless unless you have native code. However, here goes.

除非您有本机代码,否则以下内容几乎肯定是无用的。但是,这里是。

  1. Start java program in java debugger, with breakpoint well before possible sigsegv.
  2. Use the ps command to obtain the processid of java.
  3. gdb /usr/lib/jvm/sun-java6/bin/java processid
  4. make sure that the gdb 'handle' command is set to stop on SIGSEGV
  5. continue in the java debugger from the breakpoint.
  6. wait for explosion.
  7. Use gdb to investigate
  1. 在 java 调试器中启动 java 程序,在可能的 sigsegv 之前设置断点。
  2. 使用ps命令获取java的processid。
  3. gdb /usr/lib/jvm/sun-java6/bin/java processid
  4. 确保 gdb 'handle' 命令设置为在 SIGSEGV 上停止
  5. 从断点继续在 java 调试器中。
  6. 等待爆炸。
  7. 使用gdb进行调查

If you've really managed to make the JVM take a sigsegv without any native code of your own, you are very unlikely to make any sense of what you will see next, and the best you can do is push a test case onto a bug report.

如果您真的设法让 JVM 在没有您自己的任何本机代码的情况下使用 sigsegv,那么您就不太可能理解接下来会看到的内容,您能做的最好的事情就是将测试用例推送到错误上报告。

回答by Hanno Fietz

I found a good list at http://www.oracle.com/technetwork/java/javase/crashes-137240.html. As I'm getting the crashes during GC, I'll try switching between garbage collectors.

我在http://www.oracle.com/technetwork/java/javase/crashes-137240.html找到了一个很好的列表。当我在 GC 期间崩溃时,我将尝试在垃圾收集器之间切换。

I tried switching between the serial and the parallel GC (the latter being the default on a 64-bit Linux server), this only changed the error message accordingly.

我尝试在串行和并行 GC 之间切换(后者是 64 位 Linux 服务器上的默认值),这只会相应地更改错误消息。

Reducing the max heap size from 16G to 10G after a fresh analysis in the profiler (which gave me a heap usage flattening out at 8G) did lead to a significantly lower "Virtual Memory" footprint (16G instead of 60), but I don't even know what that means, and The Internet says, it doesn't matter.

在分析器中进行新的分析后(这让我的堆使用量在 8G 处趋于平缓)将最大堆大小从 16G 减少到 10G 确实导致“虚拟内存”占用空间显着降低(16G 而不是 60),但我不甚至不知道这意味着什么,互联网说,没关系。

Currently, the JVM is running in client mode (using the -clientstartup option thus overriding the default of -server). So far, there's no crash, but the performance impact seems rather large.

目前,JVM 以客户端模式运行(使用-client启动选项,从而覆盖 的默认值-server)。到目前为止,没有崩溃,但性能影响似乎相当大。

回答by Alan Burlison

If you have a corefile you could try running jstack on it, which would give you something a little more comprehensible - see http://download.oracle.com/javase/6/docs/technotes/tools/share/jstack.html, although if it's a bug in the gc thread it may not be all that helpful.

如果你有一个核心文件,你可以尝试在它上面运行 jstack,这会让你更容易理解 - 请参阅http://download.oracle.com/javase/6/docs/technotes/tools/share/jstack.html,虽然如果它是 gc 线程中的一个错误,它可能没有那么有用。

回答by Rohit

Try to check whether c program carsh which have caused java crash.use valgrind to know invalid and also cross check stack size.

尝试检查是否导致java崩溃的c程序carsh。使用valgrind知道无效并交叉检查堆栈大小。