Java 周期性地挂在 futex 和非常低的 IO 输出

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/32262946/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-02 19:54:48  来源:igfitidea点击:

Java periodically hangs at futex and very low IO output

javaiofutexhuge-pages

提问by bforevdr

Currently my application periodically blocked in IO , and the output is very low . I use some command to trace the process.

目前我的应用程序在 IO 中定期阻塞,并且输出非常低。我使用一些命令来跟踪这个过程。

By using jstacki found that the app is hanging at FileOutputStream.writeBytes.

通过使用jstack,我发现该应用程序挂在 FileOutputStream.writeBytes 上。

By using strace -f -c -p pidto collect syscall info, i found that. For normal situation, it has both futex and write syscalls. But when it went unnormal, there are only futex syscalls. The app keeps calling futex but all failed and throw ETIMEDOUT, just like this:

通过使用strace -f -c -p pid收集系统调用信息,我发现了这一点。对于正常情况,它同时具有 futex 和 write 系统调用。但是当它不正常时,只有 futex 系统调用。该应用程序不断调用 futex 但都失败并抛出 ETIMEDOUT,就像这样:

<futex resumed>  =-1 ETIMEDOUT (Connecton timed out)
futex(Ox7f823, FUTEX_WAKE_PRIVATE,1)=0
futex(Ox7f824, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME) =-1<unfinished>
<futex resumed>  =-1 ETIMEDOUT (Connecton timed out)
futex(Ox7f823, FUTEX_WAKE_PRIVATE,1)=0
futex(Ox7f824, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME) =-1<unfinished>

This issue happens periodically ,and continues for mintues or hours, and go normal again.

此问题会定期发生,并持续数分钟或数小时,然后再次恢复正常。

Escipally, when blocked in IO, echo 3 > /proc/sys/vm/drop_cachesalways makes it go normal temporarily. I googled it and found some similiar proleam, listing below.

escipally,当IO阻塞时,echo 3 > /proc/sys/vm/drop_caches总是让它暂时正常。我用谷歌搜索并找到了一些类似的问题,如下所示。

  1. leap second. Doesn't work, our system's ntpd is stopped.
  2. transparent hugepage bug. https://bugzilla.redhat.com/show_bug.cgi?id=879801This is very similar to my probleam, but my khugepaged process is normal, and the load is always nearly zero. Escipally drop_cachesworks for my application too. And my system is also multi core and large memory. It donsn't work for me. So anyone met the same probleam or familiar with this issue?
  1. 闰秒。不起作用,我们系统的 ntpd 已停止。
  2. 透明的大页错误。https://bugzilla.redhat.com/show_bug.cgi?id=879801这个和我的问题很相似,但是我的khugepaged进程是正常的,负载总是接近于零。特别是drop_caches 也适用于我的应用程序。而且我的系统也是多核大内存。它对我不起作用。那么有人遇到过同样的问题或熟悉这个问题吗?

Some info about my system. OS:Redhat 6.1, kernal version 2.6.31

关于我的系统的一些信息。操作系统:Redhat 6.1,内核版本 2.6.31

JDK:1.7.0_05

JDK:1.7.0_05

CPU:X5650, 24cores

CPU:X5650,24核

Memory :24GB and 48GB

内存:24GB 和 48GB

回答by Guy Sela

Maybe the kernel bug in futex_wait()?

也许是 futex_wait() 中的内核错误?

You can read about it here: https://groups.google.com/forum/#!topic/mechanical-sympathy/QbmpZxp6C64

你可以在这里阅读:https: //groups.google.com/forum/#!topic/mechanical-sympathy/QbmpZxp6C64

回答by Aex Aey

In addition to clock jumps and aforementioned (rather old) THP kernel bug, another common reason for java to unexpectedly block on IO is reading veryslow and blocking /dev/randomwhich some libraries prefer over more commonly used and much better performing /dev/urandom.

除了时钟跳跃和前面提到的(相当老的)THP 内核错误之外,java 意外阻塞 IO 的另一个常见原因是读取速度非常慢并且阻塞 /dev/random一些库更喜欢使用更常用且性能更好的 /dev/随机。

Easy way to tell if that was the culprit:

判断这是否是罪魁祸首的简单方法:

sudo mv /dev/random /dev/random.real
sudo ln -s /dev/urandom /dev/random

...then restart the app and see if it stops IO blocking. Once done with the test, you probably want to restore /dev/random:

...然后重新启动应用程序,看看它是否停止了 IO 阻塞。完成测试后,您可能想要恢复 /dev/random:

sudo mv /dev/random.real /dev/random

...and open a bug with application vendor asking to use /dev/urandom where appropriate.

...并向应用程序供应商提出一个错误,要求在适当的情况下使用 /dev/urandom。