Java 线程转储分析工具/方法
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3156434/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Thread Dump Analysis Tool / Method
提问by Ivo Rossi
When the Java application is hanging, you don't even know the use case that is leading to this and want to investigate, I understand that thread dumps can be useful.
当 Java 应用程序挂起时,您甚至不知道导致此问题的用例并想进行调查,我知道线程转储可能很有用。
But how can we easily derive useful data from the thread dumps to find where the problem is? The server application that I've been working with produces very long thread dumps, because it is an EJB architecture and thread dumps contains many container threads that I'm not sure I should be looking at (i.e. threads that are not running my application code, but JBoss's code).
但是我们如何轻松地从线程转储中获取有用的数据以找出问题所在呢?我一直在使用的服务器应用程序产生了很长的线程转储,因为它是一个 EJB 架构,线程转储包含许多我不确定我应该查看的容器线程(即没有运行我的应用程序代码的线程) ,但 JBoss 的代码)。
Yesterday I tried the Thread Dump Analyzertool. The tool is definitely better than looking at the raw thread dumps in a text editor, because you can filter out threads that you're not interested in, see the thread list, click on a thread to see its details, compare thread dumps to find long running threads, etc. See screenshot below:
昨天我尝试了线程转储分析器工具。该工具绝对比在文本编辑器中查看原始线程转储更好,因为您可以过滤掉您不感兴趣的线程,查看线程列表,单击线程查看其详细信息,比较线程转储以查找长时间运行的线程等。见下面的截图:
But there's still too much data to analyse - almost 300 threads. I don't know of any criteria that I could use to filter out all the JBoss threads, in which I'm not interested. I'm not sure if I should be looking at threads that are currently in "runnable" state only or if "waiting on condition" and "in Object.wait" are also important.
但仍有太多数据需要分析——将近 300 个线程。我不知道有什么标准可以用来过滤掉我不感兴趣的所有 JBoss 线程。我不确定我是否应该只查看当前处于“可运行”状态的线程,或者“等待条件”和“在 Object.wait”中是否也很重要。
What's the approach that you would normally follow and tools that you would in general use?
您通常会遵循的方法和通常使用的工具是什么?
采纳答案by JoseK
One set of thread dumps alone will not be too helpful to get to the root cause.
单独的一组线程转储不会太有助于找到根本原因。
The trick is to take 4 or 5 sets of thread dumps at an interval of 5 seconds between each. so at the end you will have a single log file which has around 20 - 25 seconds worth of action on the app server.
诀窍是每隔 5 秒进行 4 或 5 组线程转储。所以最后你会有一个日志文件,它在应用程序服务器上有大约 20 - 25 秒的操作。
What you want to check is when a stuck thread or long running transaction happens, all the thread dumps will show a certain thread id is at the same line in your java stack trace. In simpler terms, the transaction (say in an EJB or database) is spanning across multiple thread dumps and hence needs more investigation.
您要检查的是,当发生卡住的线程或长时间运行的事务时,所有线程转储都会显示某个线程 ID 位于您的 Java 堆栈跟踪中的同一行。简单来说,事务(例如在 EJB 或数据库中)跨越多个线程转储,因此需要更多调查。
Now when you run these through Samurai(I havent used TDA myself), it will highlight these in Red colour so you can quickly click on it and get to the lines showing issues.
现在,当您通过Samurai运行这些(我自己没有使用 TDA)时,它会以红色突出显示这些,因此您可以快速单击它并到达显示问题的行。
See an example of this here. Look at the Samurai output image in that link. The Green cells are fine. Red and Grey cells need looking at.
请参阅此处的示例。查看该链接中的 Samurai 输出图像。绿色单元格很好。需要查看红色和灰色单元格。
A Samurai example from my own web app below shows a stuck sequence for Thread'19' across a span of 5 - 10 seconds
下面来自我自己的 Web 应用程序的 Samurai 示例显示了 Thread'19' 在 5 到 10 秒的跨度内的卡住序列
> Thread dump 2/3 "[ACTIVE] ExecuteThread: '19' for queue:
> 'weblogic.kernel.Default
> (self-tuning)'" daemon prio=7
> tid=07b06000 nid=108 lwp_id=222813
> waiting for monitor entry
> [2aa40000..2aa40b30]
> java.lang.Thread.State: BLOCKED (on
> object monitor) at
> com.bea.p13n.util.lease.JDBCLeaseManager.renewLease(JDBCLeaseManager.java:393)
> - waiting to lock <735e9f88> (a com.bea.p13n.util.lease.JDBCLeaseManager)
> at
> com.bea.p13n.util.lease.Lease$LeaseTimer.timerExpired(Lease.java:229)
...
...
> Thread dump 3/3 "[ACTIVE]
> ExecuteThread: '19' for queue:
> 'weblogic.kernel.Default
> (self-tuning)'" daemon prio=7
> tid=07b06000 nid=108 lwp_id=222813
> waiting for monitor entry
> [2aa40000..2aa40b30]
> java.lang.Thread.State: BLOCKED (on
> object monitor) at
> com.bea.p13n.util.lease.JDBCLeaseManager.renewLease(JDBCLeaseManager.java:393)
> - waiting to lock <735e9f88> (a com.bea.p13n.util.lease.JDBCLeaseManager)
> at
> com.bea.p13n.util.lease.Lease$LeaseTimer.timerExpired(Lease.java:229)
update
更新
I recently used the Java Thread Dump Analyzermentioned in this answerand it's been very useful for Tomcat as opposed to Samurai
我最近使用了这个答案中提到的Java Thread Dump Analyzer,它对 Tomcat 非常有用,而不是 Samurai
回答by Michael Borgwardt
I'm not sure if I should be looking at threads that are currently in "runnable" state only or if "waiting on condition" and "in Object.wait" are also important.
我不确定我是否应该只查看当前处于“可运行”状态的线程,或者“等待条件”和“在 Object.wait”中是否也很重要。
The latter two are actually thethings to look for when diagnosing a deadlock, as you seem to be doing. "Runnable" means the thread is doing something right now (or waiting to get the CPU). "blocked" and "waiting" is what deadlocks are made of.
后两者实际上是诊断死锁时要寻找的东西,正如您所做的那样。“Runnable”意味着线程现在正在做某事(或等待获取 CPU)。“阻塞”和“等待”是死锁的组成部分。
Of course, an application container will have plenty of threads waiting legitimately. To filter out the interesting cases, look at the stack trace. If it's framework classes (and especially ones called "Worker" or "Queue") it's probably OK. If it's application code, you should look at it more closely.
当然,应用程序容器将有大量线程合法地等待。要过滤掉有趣的情况,请查看堆栈跟踪。如果它是框架类(尤其是称为“Worker”或“Queue”的类),那可能没问题。如果是应用程序代码,您应该更仔细地查看它。
回答by mchr
I know this is an old question but I just wrote a tool to help make long thread dumps more readable.
我知道这是一个老问题,但我只是写了一个工具来帮助使长线程转储更具可读性。
Java Thread Dump Analysis Tool
This tool groups threads together which have the same stack trace and allows you to only show threads which are in particular states (e.g. RUNNABLE or BLOCKED).
该工具将具有相同堆栈跟踪的线程组合在一起,并允许您仅显示处于特定状态(例如 RUNNABLE 或 BLOCKED)的线程。
This makes it a bit quicker to find the interesting threads amongst the tens or hundreds of JBoss threads which spend most of their time waiting for work at the same place in the code and therefore all have the same stack trace.
这使得在数十或数百个 JBoss 线程中找到有趣的线程变得更快一点,这些线程大部分时间都在代码中的同一位置等待工作,因此都具有相同的堆栈跟踪。