Java 阻塞问题:为什么 JVM 会阻塞许多不同类/方法中的线程?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4016356/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-14 08:28:11  来源:igfitidea点击:

Java blocking issue: Why would JVM block threads in many different classes/methods?

javagarbage-collectionlockingblockingconcurrent-programming

提问by user331465

Update:This looks like a memory issue. A 3.8 Gb Hprof file indicated that the JVM was dumping-its-heap when this "blocking" occurred. Our operations team saw that the site wasn't responding, took a stack trace, then shut down the instance. I believe they shut down the site before the heap dump finished. The log had noerrors/exceptions/evidence of problems--probably because the JVM was killed before it could generate an error message.

更新:这看起来像是内存问题。一个 3.8 Gb Hprof 文件表明 JVM 在发生这种“阻塞”时正在转储其堆。我们的运营团队发现该站点没有响应,进行了堆栈跟踪,然后关闭了该实例。我相信他们在堆转储完成之前关闭了站点。日志没有错误/异常/问题证据——可能是因为 JVM 在生成错误消息之前就被终止了。

Original Question We had a recent situation where the application appeared --to the end user--to hang. We got a stack trace before the application restart and I found some surprising results: of 527 threads, 463 had thread state BLOCKED.

原始问题 我们最近遇到了一种情况,即应用程序出现 -- 对最终用户而言 -- 挂起。我们在应用程序重新启动之前得到了一个堆栈跟踪,我发现了一些令人惊讶的结果:在 527 个线程中,463 个线程状态为 BLOCKED。

In the PastIn the past blocked thread usually had this issue: 1) some obvious bottleneck: e.g. some database record lock or file system lock problem which caused other threads to wait. 2) All blocked threads would block on the same class/method (e.g. the jdbc or file system clases)

过去过去被阻塞的线程通常有这样的问题: 1)一些明显的瓶颈:例如一些数据库记录锁或文件系统锁问题导致其他线程等待。2) 所有被阻塞的线程都将阻塞在同一个类/方法上(例如 jdbc 或文件系统类)

Unusual DataIn this case, I see all sorts of classes/methods blocked, including jvm internal classes, jboss classes, log4j, etc, in addition to application classes (including jdbc and lucene calls)

异常数据在这种情况下,除了应用程序类(包括 jdbc 和 lucene 调用)之外,我看到各种类/方法被阻塞,包括 jvm 内部类、jboss 类、log4j 等

The questionwhat would cause a JVM to block log4j.Hierarchy.getLogger, java.lang.reflect.Constructor.newInstance? Obviously some resource "is scarce" but which resource?

问题是什么会导致 JVM 阻塞 log4j.Hierarchy.getLogger、java.lang.reflect.Constructor.newInstance?显然有些资源“稀缺”,但哪种资源呢?

thanks

谢谢

will

将要

Stack Trace Excerpts

堆栈跟踪摘录

http-0.0.0.0-80-417" daemon prio=6 tid=0x000000000f6f1800 nid=0x1a00 waiting for monitor entry [0x000000002dd5d000]
   java.lang.Thread.State: BLOCKED (on object monitor)
                at sun.reflect.GeneratedConstructorAccessor68.newInstance(Unknown Source)
                at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
                at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
                at java.lang.Class.newInstance0(Class.java:355)
                at java.lang.Class.newInstance(Class.java:308)
                at org.jboss.ejb.Container.createBeanClassInstance(Container.java:630)

http-0.0.0.0-80-451" daemon prio=6 tid=0x000000000f184800 nid=0x14d4 waiting for monitor entry [0x000000003843d000]
   java.lang.Thread.State: BLOCKED (on object monitor)
                at java.lang.Class.getDeclaredMethods0(Native Method)
                at java.lang.Class.privateGetDeclaredMethods(Class.java:2427)
                at java.lang.Class.getMethod0(Class.java:2670)

"http-0.0.0.0-80-449" daemon prio=6 tid=0x000000000f17d000 nid=0x2240 waiting for monitor entry [0x000000002fa5f000]
   java.lang.Thread.State: BLOCKED (on object monitor)
                at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.register(Http11Protocol.java:638)
                - waiting to lock <0x00000007067515e8> (a org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler)
                at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.createProcessor(Http11Protocol.java:630)


"http-0.0.0.0-80-439" daemon prio=6 tid=0x000000000f701800 nid=0x1ed8 waiting for monitor entry [0x000000002f35b000]
   java.lang.Thread.State: BLOCKED (on object monitor)
                at org.apache.log4j.Hierarchy.getLogger(Hierarchy.java:261)
                at org.apache.log4j.Hierarchy.getLogger(Hierarchy.java:242)
                at org.apache.log4j.LogManager.getLogger(LogManager.java:198)

采纳答案by andersoj

These are listed roughly in the order I would try them, depending on the evidence collected:

这些大致按照我尝试的顺序列出,具体取决于收集的证据:

  • Have you looked at GC behavior? Are you under memory pressure? That could result in newInstance()and a few others above being blocked. Run your VM with -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -verbose:gcand log the output. Are you seeing excessive GC times near the time of failure/lockup?
    • Is the condition repeatable? If so, try with varying heap sizes in the JVM (-Xmx) and see if the behavior changes substantially. If so, look for memory leaks or properly size the heap for your app.
    • If the previous is tough, and you're not getting an OutOfMemoryErrorwhen you should, you can tune the GC tunables... see JDK6.0 XX options, or JDK6.0 GC Tuning Whitepaper. Look specifically at -XX:+UseGCOverheadLimitand -XX:+GCTimeLimitand related options. (note these are not well documented, but may be useful...)
  • Might there be a deadlock? With only stack trace excerpts, can't determine here. Look for cycles amongst the monitor states that threads are blocked on (vs. what they hold). I believe jconsolecan do this for you ... (yep, under the threads tab, "detect deadlocks")
  • Try doing several repeated stacktracesand look for what changes vs. what stays the same...
  • Do the forensics... for each stack entry that says "BLOCKED", go look up the specific line of code and figure out whether there is a monitor there or not. If there's an actual monitor acquisition, it should be fairly easy to identify the limiting resource. However, some of your threads may show blocked without a transparently available monitor, these will be trickier...
  • 你看过GC 行为吗?你有记忆压力吗?这可能会导致newInstance()上面的其他一些内容被阻止。运行您的 VM-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -verbose:gc并记录输出。您是否在故障/锁定时间附近看到过多的 GC 时间?
    • 条件可重复吗?如果是这样,请尝试在 JVM (-Xmx) 中使用不同的堆大小,看看行为是否发生了重大变化。如果是这样,请查找内存泄漏或为您的应用正确调整堆大小。
    • 如果前一个很困难,并且您没有得到OutOfMemoryError应有的时间,则可以调整 GC 可调参数...请参阅JDK6.0 XX optionsJDK6.0 GC Tuning Whitepaper。专门查看-XX:+UseGCOverheadLimit-XX:+GCTimeLimit和相关选项。(注意这些没有很好的记录,但可能有用......)
  • 会不会出现死锁?只有堆栈跟踪摘录,无法在此处确定。在线程被阻塞的监视器状态中寻找循环(与它们持有的相比)。我相信jconsole可以为您做到这一点......(是的,在线程选项卡下,“检测死锁”
  • 尝试做几次重复的堆栈跟踪,看看有哪些变化与哪些保持不变......
  • 进行取证......对于每个显示“BLOCKED”的堆栈条目,查找特定的代码行并确定那里是否有监视器。如果有实际的监视器获取,识别限制资源应该相当容易。但是,您的某些线程可能会在没有透明可用监视器的情况下显示为阻塞状态,这些将更加棘手...