java 在 Solaris 上没有从 jvm 创建和核心转储 hs_err_pid.log 文件

Question

提问by pkk

Problem description

问题描述

After a while of running my java server application I am experiencing strange behaviour of Oracle Java virtual machine on Solaris. Normally, when there is a crash of jvm hs_err_pid.logfile gets created (location is determined by -XX:ErrorFilejvm paramter as explained here: How can I suppress the creation of the hs_err_pid file?

运行我的 java 服务器应用程序一段时间后，我在 Solaris 上遇到了 Oracle Java 虚拟机的奇怪行为。通常，当hs_err_pid.log创建jvm文件时发生崩溃（位置由-XX:ErrorFilejvm 参数确定，如下所述：如何禁止创建 hs_err_pid 文件？

But in my case, the file was not created, the only thing left was the corecore dump file.

但在我的情况下，文件没有被创建，唯一剩下的就是core核心转储文件。

Using pstackand pflagsstandard Solaris tools I was able to gather more information about the crash (which are included below) from the corefile.

使用pstack和pflags标准Solaris工具，我是能够收集从崩溃的原因（在下面包括在内）的更多信息core文件。

Tried solutions

尝试过的解决方案

Tried to find all hs_err_pid.logfiles across the file system, but nothing could be found (even outside the application working directory). i.e.:
find / -name "hs_err_pid*"
I tried to find jvm bugs related to jvm, but I couldn't find nothing interesting similar to this case.
The problem looks somewhat similar to: Java VM: reproducable SIGSEGV on both 1.6.0_17 and 1.6.0_18, how to report?but still I cannot confirm this since the hs_err_pid.logfile is missing and of course the OS platform is different.
(EDIT)As suggested in one of the answers to Tool for analyzing java core dumpquestion, I have extracted heap dump from the corefile using jmapand analysed it with with Eclipse MAT. I have found a leak (elements added to HashMap, never to be cleansed, at the time of core dump 1,4 M elements). This however does not explain why hs_err_pid.logfile was not generated, nor jvm crashing.
(EDIT2)As suggested by Darryl Miles, -Xmx limitations has been checked (Test contained code that indefinitely added objects to a LinkedList):
- java -Xmx1444m Testresults with java.lang.OutOfMemoryError: Java heap space,
- java -Xmx2048m Testresults with java.lang.OutOfMemoryError: Java heap space,
- java -Xmx3600m Testresults with core dump.

试图hs_err_pid.log在文件系统中查找所有文件，但找不到任何文件（即使在应用程序工作目录之外）。IE：
find / -name "hs_err_pid*"
我试图找到与 jvm 相关的 jvm 错误，但我找不到与此案例类似的有趣内容。
问题看起来有点类似于：Java VM: reproducable SIGSEGV on both 1.6.0_17 and 1.6.0_18, how to report? 但我仍然无法确认这一点，因为hs_err_pid.log文件丢失，当然操作系统平台不同。
（编辑）正如用于分析 java 核心转储问题的工具的答案之一所建议的，我已经core使用jmapEclipse MAT从文件中提取了堆转储并对其进行了分析。我发现了一个泄漏（添加到 HashMap 的元素，永远不会被清理，在核心转储 1,4 M 元素时）。然而，这并不能解释为什么hs_err_pid.log没有生成文件，也没有解释为什么jvm 崩溃。
(EDIT2)正如 Darryl Miles 所建议的，已经检查了 -Xmx 限制（测试包含无限期向 a 添加对象的代码LinkedList）：
- java -Xmx1444m Test结果与java.lang.OutOfMemoryError: Java heap space，
- java -Xmx2048m Test结果与java.lang.OutOfMemoryError: Java heap space，
- java -Xmx3600m Test结果与核心转储。

The question

问题

Has anyone experienced similar problem with jvm and how to proceed in such cases to find what actually happened (i.e. in what case the core gets dumped from the jvm and no hs_err_pid.logfile is created)?

有没有人遇到过与 jvm 类似的问题，以及如何在这种情况下继续查找实际发生的情况（即在什么情况下核心从 jvm 中转储而没有hs_err_pid.log创建文件）？

Any tip or pointer to resolving this would be very helpful.

解决此问题的任何提示或指示都会非常有帮助。

Extracted flags

提取的标志

# pflags core
...
/2139095:      flags = DETACH
    sigmask = 0xfffffeff,0x0000ffff  cursig = SIGSEGV

Extracted stack

提取堆栈

# pstack core
...
-----------------  lwp# 2139095 / thread# 2139095  --------------------
 fb208c3e ???????? (f25daee0, f25daec8, 74233960, 776e3caa, 74233998, 776e64f0)
 fb20308d ???????? (0, 1, f25db030, f25daee0, f25daec8, 7423399c)
 fb20308d ???????? (0, 0, 50, f25da798, f25daec8, f25daec8)
 fb20308d ???????? (0, 0, 50, f25da798, 8561cbb8, f25da988)
 fb203403 ???????? (f25da988, 74233a48, 787edef5, 74233a74, 787ee8a0, 0)
 fb20308d ???????? (0, f25da988, 74233a78, 76e2facf, 74233aa0, 76e78f70)
 fb203569 ???????? (f25da9b0, 8b5b400, 8975278, 1f80, fecd6000, 1)
 fb200347 ???????? (74233af0, 74233d48, a, 76e2fae0, fb208f60, 74233c58)
 fe6f4b0b __1cJJavaCallsLcall_helper6FpnJJavaValue_pnMmethodHandle_pnRJavaCallArguments_pnGThread__v_ (74233d44, 74233bc8, 74233c54, 8b5b400) + 1a3
 fe6f4db3 __1cCosUos_exception_wrapper6FpFpnJJavaValue_pnMmethodHandle_pnRJavaCallArguments_pnGThread__v2468_v_ (fe6f4968, 74233d44, 74233bc8, 74233c54, 8b5b4
00) + 27
 fe6f4deb __1cJJavaCallsEcall6FpnJJavaValue_nMmethodHandle_pnRJavaCallArguments_pnGThread__v_ (74233d44, 8975278, 74233c54, 8b5b400) + 2f
 fe76826d __1cJJavaCallsMcall_virtual6FpnJJavaValue_nLKlassHandle_nMsymbolHandle_4pnRJavaCallArguments_pnGThread__v_ (74233d44, 897526c, fed2d464, fed2d6d0, 7
4233c54, 8b5b400) + c1
 fe76f4fa __1cJJavaCallsMcall_virtual6FpnJJavaValue_nGHandle_nLKlassHandle_nMsymbolHandle_5pnGThread__v_ (74233d44, 8975268, 897526c, fed2d464, fed2d6d0, 8b5b
400) + 7e
 fe7805f6 __1cMthread_entry6FpnKJavaThread_pnGThread__v_ (8b5b400, 8b5b400) + d2
 fe77cbe4 __1cKJavaThreadRthread_main_inner6M_v_ (8b5b400) + 4c
 fe77cb8e __1cKJavaThreadDrun6M_v_ (8b5b400) + 182
 feadbd59 java_start (8b5b400) + f9
 feed59a9 _thr_setup (745c5200) + 4e
 feed5c90 _lwp_start (745c5200, 0, 0, 74233ff8, feed5c90, 745c5200)

System information:

系统信息：

# uname -a
SunOS xxxx 5.10 Generic_137138-09 i86pc i386 i86pc
# java -version
java version "1.6.0_11"
Java(TM) SE Runtime Environment (build 1.6.0_11-b03)
Java HotSpot(TM) Server VM (build 11.0-b16, mixed mode)
# ulimit -a
time(seconds) unlimited
file(blocks) unlimited
data(kbytes) unlimited
stack(kbytes) 10240
coredump(blocks) unlimited
nofiles(descriptors) 256
memory(kbytes) unlimited

Used jvm args:

使用的 jvm 参数：

java -Xms1024M -Xmx2048M -verbose:gc -Xloggc:logs/gc.log -server com.example.MyApplication

Please comment if you find some information missing, I'll try to add them.

如果您发现缺少某些信息，请发表评论，我会尝试添加它们。

Answer 1

采纳答案by bestsss

6.0_11 is quite old and I have no recent experiences with, really recommend upgrade there...

6.0_11 已经很旧了，我最近没有使用过，真的建议在那里升级...

However, no crash dump may occur with stackoverflow in the native code, i.e. calling some native function (like write of FileOutputStream, sockets use the same impl) with very low stack. So, even though the JVM attempts to write the file, there is not enough stack and the writing code also crashes. The second stackoverflow just bails out the process.

但是，本地代码中的stackoverflow 不会发生崩溃转储，即调用一些具有非常低堆栈的本地函数（如FileOutputStream 的写入，套接字使用相同的impl）。因此，即使JVM尝试写入文件，也没有足够的堆栈，写入代码也会崩溃。第二个 stackoverflow 只是避免了这个过程。

I did have similar case (no file created) on a production system and it was not pretty to trace it, yet the above explains the reason.

我在生产系统上确实有类似的情况（没有创建文件），追踪它并不好，但上面解释了原因。

Answer 2

回答by Darryl Miles

As per my comments above. I beleive this issue to be running out of usable heap in 32bit address space by having set too high a -Xmx value. This forced the Kernel to police the limit (by denying requests for new memory) before the JVM could police it (by using controlled OutOfMemoryException mechanism). Unfortunately I do not know the specifics of Intel Solaris to know what is to be expected from that platform.

根据我上面的评论。我相信这个问题是由于将 -Xmx 值设置得太高而导致 32 位地址空间中的可用堆耗尽。这迫使内核在 JVM 对其进行监管（通过使用受控的 OutOfMemoryException 机制）之前监管该限制（通过拒绝对新内存的请求）。不幸的是，我不知道英特尔 Solaris 的具体情况，无法了解该平台的预期内容。

But as a general rule for Windows a maximum -Xmx might be 1800M and then reduce it by 16M per additional application thread you create. Since each thread needs stack space (both native and Java stack) as well as other per-thread accounting matters like Thread Local Storage etc... The result of this calculation should give you an approximation of the realistic usable heap space of a Java VM on any 32bit bit process whose operating system uses a 2G/2G split (User/Kernel).

但作为 Windows 的一般规则，最大 -Xmx 可能是 1800M，然后每个额外创建的应用程序线程将其减少 16M。由于每个线程都需要堆栈空间（本机和 Java 堆栈）以及其他每个线程的记帐事项，例如线程本地存储等……此计算的结果应该为您提供 Java VM 的实际可用堆空间的近似值在操作系统使用 2G/2G 拆分（用户/内核）的任何 32 位进程上。

It is possible with WinXP and above to use /3G switch on the kernel to get higher split (3G/1G user/kernel) and Linux has a /proc/<pid>/map file to allow you to see exactly how the process address space is laid out of a given process (if you were running this application you could watch over time as the [heap] grows to meet the shared file mappings used for .text/.rodata/.data/etc... from DSOs this results in the kernel denying requests to grow the heap.

WinXP 及以上版本可以在内核上使用 /3G 开关来获得更高的拆分（3G/1G 用户/内核），而 Linux 有一个 /proc/<pid>/map 文件，可以让您准确查看进程地址空间由给定进程布置（如果您正在运行此应用程序，您可以随着时间的推移观察 [堆] 的增长以满足用于 .text/.rodata/.data/etc... 从 DSO 的共享文件映射）导致内核拒绝增加堆的请求。

This problem goes away for 64bit because there is so much more address space to use and you will run out of physical and virtual (swap) memory before the heap meets the other mappings.

这个问题在 64 位上消失了，因为有更多的地址空间可供使用，并且在堆遇到其他映射之前，您将耗尽物理和虚拟（交换）内存。

I believe 'truss' on Solaris would have show up a brk/sbrk system-call that returned an error code, shortly before the core dump. Parts of standard native libraries are coded to never check the return code from requests for new memory and as a result crashes can be expected.

我相信 Solaris 上的“truss”会在核心转储前不久显示一个返回错误代码的 brk/sbrk 系统调用。部分标准本机库被编码为从不检查新内存请求的返回代码，因此可能会导致崩溃。

java 在 Solaris 上没有从 jvm 创建和核心转储 hs_err_pid.log 文件

提问by pkk

采纳答案by bestsss

回答by Darryl Miles

相关推荐

最近更新

标签

java 在 Solaris 上没有从 jvm 创建和核心转储 hs_err_pid.log 文件

提问by pkk

采纳答案by bestsss

回答by Darryl Miles

相关推荐

java 如何使用 JAXB 将 HashTable<String, String> 序列化为 XML？

java 为什么我会收到“.class”预期错误？简单数组脚本

java 如何从二维数组中找到索引

java 将 C++ long 类型转换为 JNI jlong

相关推荐

最近更新

标签