Java VM:在 1.6.0_17 和 1.6.0_18 上均可重现 SIGSEGV,如何报告?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2299250/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Java VM: reproducible SIGSEGV on both 1.6.0_17 and 1.6.0_18, how to report?
提问by SyntaxT3rr0r
EDIT: This reproducible SIGSEGV happens on a Linux machine with more than one proc and more than 2GB of mem, so Java is defaulting to the -server mode. Interestingly enough if I force "-client" there's no crash anymore... (I'm still not too sure what to do with my reproducible SIGSEGV but it's interesting nonetheless).
编辑:此可重现的 SIGSEGV 发生在具有多个 proc 和超过 2GB 内存的 Linux 机器上,因此 Java 默认为 -server 模式。有趣的是,如果我强制使用“-client”就不会再崩溃了......(我仍然不太确定如何处理我的可重现 SIGSEGV 但它仍然很有趣)。
First note that this is a bit related but not identical to the following because in our case it's only a SIGSEGV that happens, and we can reliably trigger it:
首先请注意,这与以下内容有点相关但不完全相同,因为在我们的例子中,它只是一个 SIGSEGV 发生,我们可以可靠地触发它:
JVM OutOfMemory error "death spiral" (not memory leak)
JVM OutOfMemory 错误“死亡螺旋”(不是内存泄漏)
It's related because it happens when we feed our app with a "deluge of data": data are coming from text files and then number-crunched (yes, financial number crunching in Java).
这是相关的,因为当我们向我们的应用程序提供“大量数据”时会发生这种情况:数据来自文本文件,然后经过数字处理(是的,Java 中的财务数字处理)。
I can reliably trigger a JVM to SIGSEGV using only valid Java code.
我可以仅使用有效的 Java 代码可靠地将 JVM 触发到 SIGSEGV。
NOTE: I can invariably crash both JVM 1.6.0_17 adn JVM 1.6.0_18 and this question is not about how to workaround this issue (for example playing with VM parameters mayfix the issue but I'm not after that, I want to know what to do with this always-reproducable SIGSEGV).
注意:我总是会同时使 JVM 1.6.0_17 和 JVM 1.6.0_18 崩溃,这个问题不是关于如何解决这个问题(例如,使用 VM 参数可能会解决这个问题,但我不是在那之后,我想知道如何处理这个始终可重现的 SIGSEGV)。
I've got a workaround which simply consists in using Java 1.5 when launching our app (while still using Java 1.6 to run IntelliJ IDEA, etc. on the same machine, simultaneously), but my question is if this should be reported or not and, if it should, how to report it knowing that the log itself contains proprietary information (the full hs_err_..._log).
我有一个解决方法,它只是在启动我们的应用程序时使用 Java 1.5(同时仍然使用 Java 1.6 在同一台机器上同时运行 IntelliJ IDEA 等),但我的问题是这是否应该被报告以及,如果应该,如何报告它知道日志本身包含专有信息(完整的 hs_err_..._log)。
Hardware error can be ruled out for:
可以排除硬件错误:
this is happening on a workstation that regularly reaches months of uptime (I only reboot it when critical security patches affecting my trimmed down and hardened Debian Linux are issued, which really doesn't happen often) and on which applications never crash (making it very unlikely that it's an hardware issue on that machine [more below])
same application works perfectly on that same machine under a JVM 1.5 under the same load (this is how I'm testing the app: I simply launch it under a 1.5 VM)
same application works perfectly fine on more than one hundreds clients machine under the same (gigantic) load (never crashed once on Windows + JVM 1.5 or 1.6 and never crashed once on OS X + JVM 1.5 or 1.6 [a crash would mean an instant phone call from the client])
other application on that same machine and same 1.6.0_17 or 1.6.0_18 JVM never crash (for example I've got two instances of IntelliJ IDEA running as two different users on that same machine and they don't crash)
machine is tested with memtest "regularly" (before installing a new OS, which last happened when I installed Debian Lenny, not that long ago)
这发生在一个工作站上,它的正常运行时间经常达到数月(我只在发布影响我的精简和强化的 Debian Linux 的关键安全补丁时才重新启动它,这确实不经常发生)并且应用程序永远不会崩溃(使其非常不太可能是那台机器上的硬件问题 [更多信息如下])
相同的应用程序在相同负载下的 JVM 1.5 下的同一台机器上完美运行(这就是我测试应用程序的方式:我只是在 1.5 VM 下启动它)
相同的应用程序在相同(巨大)负载下在一百多个客户端机器上运行得非常好(在 Windows + JVM 1.5 或 1.6 上从未崩溃过一次,在 OS X + JVM 1.5 或 1.6 上从未崩溃过一次[崩溃意味着即时电话来自客户的电话])
同一台机器上的其他应用程序和相同的 1.6.0_17 或 1.6.0_18 JVM 永远不会崩溃(例如,我有两个 IntelliJ IDEA 实例在同一台机器上作为两个不同的用户运行,并且它们没有崩溃)
机器“定期”使用 memtest 进行测试(在安装新操作系统之前,最近一次发生在我安装 Debian Lenny 时,不久前)
Here's the reproducible-on-demand SIGSEGV:
这是可重现的按需 SIGSEGV:
... $uname -a
Linux saturn 2.6.26-2-686 #1 SMP Wed Nov 4 20:45:37 UTC 2009 i686 GNU/Linux
... $ export /home/wizard/jdk1.6.0_17/bin:$PATH
... $ java -version
java version "1.6.0_17"
Java(TM) SE Runtime Environment (build 1.6.0_17-b04)
Java HotSpot(TM) Server VM (build 14.3-b01, mixed mode)
Launch the app, feed it a "deluge of data", wait a few seconds...
启动应用程序,为其提供“大量数据”,等待几秒钟......
Then, invariably, for 1.6.0_17:
然后,总是,对于 1.6.0_17:
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0xb76d0080, pid=30793, tid=2514328464
#
# JRE version: 6.0_17-b04
# Java VM: Java HotSpot(TM) Server VM (14.3-b01 mixed mode linux-x86 )
# Problematic frame:
# V [libjvm.so+0x4bc080]
#
# An error report file with more information is saved as:
# /home/wizard/hs_err_pid30793.log
#
# If you would like to submit a bug report, please visit:
# http://java.sun.com/webapps/bugreport/crash.jsp
(note that the line '[libjvm.so+0x4bc080]' is consistent for 1.6.0_17 at every SIGSEGV)
(请注意,行 '[libjvm.so+0x4bc080]' 在每个 SIGSEGV 处与 1.6.0_17 一致)
or for 1.6.0_18:
或对于 1.6.0_18:
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0xb77468f0, pid=722, tid=2514516880
#
# JRE version: 6.0_18-b07
# Java VM: Java HotSpot(TM) Server VM (16.0-b13 mixed mode linux-x86 )
# Problematic frame:
# V [libjvm.so+0x4d88f0]
#
# An error report file with more information is saved as:
# /home/wizard/hs_err_pid722.log
#
# If you would like to submit a bug report, please visit:
# http://java.sun.com/webapps/bugreport/crash.jsp
#
Aborted
(note that the line "[libjvm.so+0x4d88f0]" is consistent for 1.6.0_18 at every SIGSEGV)
(请注意,“[libjvm.so+0x4d88f0]”这一行在每个 SIGSEGV 中对于 1.6.0_18 都是一致的)
The problem is that the log file contains proprietary information that cannot be shared.
问题在于日志文件包含无法共享的专有信息。
Reproducing a "tiny test case" that reproduce the issue ain't realistic either: it's similar to the issue linked above, this only happens when a "deluge of data" is feeded to the app.
重现重现问题的“小测试用例”也不现实:它类似于上面链接的问题,这只发生在向应用程序提供“大量数据”时。
Note that the exact same application, on exactly the same hardware, with exactly the same JVM but another version of Linux (I had Debian Etch previously) did NOT trigger that SIGSEGV once.
请注意,完全相同的应用程序,在完全相同的硬件上,具有完全相同的 JVM,但另一个版本的 Linux(我之前有 Debian Etch)并没有触发一次 SIGSEGV。
But this doesn't mean the JVM isn't at fault: it could still be a JVM issue.
但这并不意味着 JVM 没有问题:它仍然可能是 JVM 问题。
Should I report this and how? (keeping in mind that writing a "reproducible tiny test case" is delusional and that the log contains proprietary information that shouldn't be leaked). Should I just edit the log and send it?
我应该报告这个以及如何报告?(请记住,编写“可重复的小测试用例”是一种妄想,并且日志包含不应泄露的专有信息)。我应该编辑日志并发送它吗?
What's the procedure to report such reproducible SIGSEGV when your log contains proprietary information and when a test case reproducing the issue ain't realistically doable?
当您的日志包含专有信息并且重现问题的测试用例实际上不可行时,报告这种可重现的 SIGSEGV 的程序是什么?
Did any of you have success opening such a bug and then see it solved in a subsequent Java release?
你们中有人成功打开过这样的错误,然后看到它在后续的 Java 版本中得到解决吗?
Do you think it's good "for the Java community" to report such an issue or I just shouldn't bother because it's not important?
您认为报告这样的问题对“Java 社区”有好处还是我不应该因为它不重要而烦恼?
采纳答案by glenti
I got similar problem upgrading to JDK 1.6_18 and it seems solved using the following options:
我在升级到 JDK 1.6_18 时遇到了类似的问题,似乎使用以下选项解决了:
-server
-Xms256m
-Xmx748m
-XX:MaxPermSize=128m
-verbose:gc
-XX:+PrintGCTimeStamps
-Xloggc:/tmp/gc.log
-XX:+PrintHeapAtGC
-XX:+PrintGCDetails
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath="/tmp"
-XX:+UseParallelGC
-XX:-UseGCOverheadLimit
# Following options just to remote monitoring with jconsole, useful to see JVM behaviour at runtime
-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=12345
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false
-Djava.rmi.server.hostname=MyHost
I still didn't double check (it is a production environment), but I think the error was due to two reasons:
我还是没有仔细检查(它是生产环境),但我认为错误是由于两个原因:
1) Wrong setting about heap and/or Permanent space (I think JDK 1.6 needs more space in heap and permanent than previous JVM versions) caused an OutOfMemoryError, but
1)关于堆和/或永久空间的错误设置(我认为 JDK 1.6 需要比以前的 JVM 版本更多的堆和永久空间)导致 OutOfMemoryError,但是
2) in the wrong original setting somebody wrote
2)在错误的原始设置中有人写了
-XX:+HeapDumpOnOutOfMemoryError="/tmp"
and not
并不是
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath="/tmp"
so probably JVM was not able to write the heapdump and we got SIGSEGV only (previous versions wrote heap dump in the working directory).
所以可能 JVM 无法写入 heapdump 而我们只有 SIGSEGV(以前的版本在工作目录中写入了 heap dump)。
Check -server -XX:+UseParallelGC -XX:-UseGCOverheadLimit
options too. I think playing with VM parameters is not a workaround, but the right approach also because garbage collector (and not only) changed between 1.5 and 1.6.
检查-server -XX:+UseParallelGC -XX:-UseGCOverheadLimit
选项。我认为使用 VM 参数不是一种解决方法,而是正确的方法,因为垃圾收集器(不仅仅是)在 1.5 和 1.6 之间发生了变化。
回答by Kevin
The problem is that the log file contains proprietary information that cannot be shared. Reproducing a "tiny test case" that reproduce the issue ain't realistic either
问题在于日志文件包含无法共享的专有信息。重现重现问题的“小测试用例”也不现实
If you can't provide Sun with a reproducible test case, they won't even look at it. Chance are good that they will ignore it even if you do provide a usable test case. The bug submission process at Sun leaves a lot to be desired.
如果您不能向 Sun 提供可重现的测试用例,他们甚至不会查看它。即使您确实提供了可用的测试用例,他们也很有可能会忽略它。Sun 的错误提交过程还有很多不足之处。
Should I report this and how?
我应该报告这个以及如何报告?
If you can't come up with a reproducible test case, don't bother. If they can't reproduce the issue, what do you expect them to do?
如果您无法提出可重现的测试用例,请不要打扰。如果他们无法重现该问题,您希望他们做什么?
Note that the exact same application, on exactly the same hardware, with exactly the same JVM but another version of Linux (I had Debian Etch previously) did NOT trigger that SIGSEGV once.
请注意,完全相同的应用程序,在完全相同的硬件上,具有完全相同的 JVM,但另一个版本的 Linux(我之前有 Debian Etch)并没有触发一次 SIGSEGV。
Does it work on a different box with the same hardware and same version of Linux?
它是否适用于具有相同硬件和相同 Linux 版本的不同机器?
回答by Thorbj?rn Ravn Andersen
The very first question you should ask yourself is:
你应该问自己的第一个问题是:
- Am I using an officially supported Linux distribution?
- 我使用的是官方支持的 Linux 发行版吗?
If not, switch to one that is.
如果没有,请切换到一个。
If you are, then report it to Sun!
如果是,请向 Sun 报告!
回答by matt b
If it helps, the bug submission link in your crash report has this disclaimer:
如果有帮助,崩溃报告中的错误提交链接包含以下免责声明:
In addition, Sun Microsystems respects your desire for privacy. Personal data collected from this program will not be sold, given or shared with organizations external to Sun. We will use this data for communications with you to clarify issues regarding the report you submitted and/or status of that report. The issues that you report may be made available to other JDC Members or Sun customers, however your personal data will be kept confidential. If you are not comfortable with the above conditions, please do not press the Submit button. If you have any questions, please refer to our Privacy Policy.
此外,Sun Microsystems 尊重您对隐私的渴望。从该计划收集的个人数据不会与 Sun 外部的组织出售、提供或共享。我们将使用这些数据与您沟通,以澄清有关您提交的报告和/或该报告状态的问题。您报告的问题可能会提供给其他 JDC 成员或 Sun 客户,但您的个人数据将被保密。如果您对上述条件不满意,请不要按提交按钮。如果您有任何疑问,请参阅我们的隐私政策。
Personally, I would report it if it was feasible to hand over the code segment in question with logs, if the data is not too sensitive (perhaps data can be masked or obfuscated in logs?).
就个人而言,如果数据不是太敏感(也许数据可以在日志中被屏蔽或混淆?),我会报告是否可以将有问题的代码段与日志一起移交。
It's impossible for you to really judge if the bug is "important" or not for others unless you can know what actually causes it. Reporting it might be the first step in Sun's engineers finding out the cause of something serious.
您不可能真正判断该错误对其他人是否“重要”,除非您知道它的真正原因。报告它可能是 Sun 工程师找出严重问题原因的第一步。