multithreading 如何检测和调试多线程问题？

Question

提问by MicSim

This is a follow up to this question, where I didn't get any input on this point. Here is the brief question:

这是对这个问题的跟进，在这一点上我没有得到任何意见。这是一个简短的问题：

Is it possible to detect and debug problems coming from multi-threaded code?

是否可以检测和调试来自多线程代码的问题？

Often we have to tell our customers: "We can't reproduce the problem here, so we can't fix it. Please tell us the steps to reproduce the problem, then we'll fix it." It's a somehow nasty answer if I know that it is a multi-threading problem, but mostly I don't. How do I get to know that a problem is a multi-threading issue and how to debug it?

很多时候我们不得不告诉我们的客户：“我们这里无法重现问题，所以我们无法修复它。请告诉我们重现问题的步骤，然后我们会修复它。” 如果我知道这是一个多线程问题，这是一个有点讨厌的答案，但大多数情况下我不知道。我如何知道一个问题是一个多线程问题以及如何调试它？

I'd like to know if there are any special logging frameworks, or debugging techniques, or code inspectors, or anything else to help solving such issues. General approaches are welcome. If any answer should be language related then keep it to .NET and Java.

我想知道是否有任何特殊的日志记录框架、调试技术、代码检查器或其他任何有助于解决此类问题的方法。欢迎使用一般方法。如果任何答案应该与语言相关，则将其保留为 .NET 和 Java。

Answer 1

回答by Lawrence Dol

Threading/concurrency problemsare notoriously difficult to replicate - which is one of the reasons why you should design to avoid or at least minimize the probabilities. This is the reason immutable objects are so valuable. Try to isolate mutable objects to a single thread, and then carefully control the exchange of mutable objects between threads. Attempt to program with a design of object hand-over, rather than "shared" objects. For the latter, use fully synchronized control objects (which are easier to reason about), and avoid having a synchronized object utilize other objects which must also be synchronized - that is, try to keep them self contained. Your best defense is a good design.

众所周知，线程/并发问题难以复制 - 这就是您应该设计以避免或至少最小化概率的原因之一。这就是不可变对象如此有价值的原因。尝试将可变对象隔离到单个线程，然后仔细控制线程之间的可变对象交换。尝试使用对象移交的设计进行编程，而不是“共享”对象。对于后者，使用完全同步的控制对象（更容易推理），并避免让同步对象利用其他也必须同步的对象——也就是说，尽量保持它们自包含。你最好的防御是一个好的设计。

Deadlocksare the easiest to debug, if you can get a stack trace when deadlocked. Given the trace, most of which do deadlock detection, it's easy to pinpoint the reason and then reason about the code as to why and how to fix it. With deadlocks, it always going to be a problem acquiring the same locks in different orders.

死锁是最容易调试的，如果您可以在死锁时获得堆栈跟踪。给定跟踪，其中大部分都进行死锁检测，很容易查明原因，然后对代码进行推理，以了解为什么以及如何修复它。对于死锁，以不同的顺序获取相同的锁总是一个问题。

Live locksare harder - being able to observe the system while in the error state is your best bet there.

活锁更难——能够在错误状态下观察系统是你最好的选择。

Race conditionstend to be extremely difficult to replicate, and are even harder to identify from manual code review. With these, the path I usually take, besides extensive testing to replicate, is to reason about the possibilities, and try to log information to prove or disprove theories. If you have direct evidence of state corruption you may be able to reason about the possible causes based on the corruption.

竞争条件往往极难复制，更难从手动代码中识别出来。有了这些，我通常采取的路径，除了广泛的测试来复制之外，是对可能性进行推理，并尝试记录信息以证明或反驳理论。如果您有国家腐败的直接证据，您可以根据腐败推断可能的原因。

The more complex the system, the harder it is to find concurrency errors, and to reason about it's behavior. Make use of tools like JVisualVM and remote connect profilers - they can be a life saver if you can connect to a system in an error state and inspect the threads and objects.

系统越复杂，就越难发现并发错误，并对其行为进行推理。使用 JVisualVM 和远程连接分析器等工具 - 如果您可以连接到处于错误状态的系统并检查线程和对象，它们可以成为救命稻草。

Also, beware the differences in possible behavior which are dependent on the number of CPU cores, pipelines, bus bandwidth, etc. Changes in hardware can affect your ability to replicate the problem. Some problems will only show on single-core CPU's others only on multi-cores.

此外，请注意可能行为的差异，这些差异取决于 CPU 内核、管道、总线带宽等的数量。硬件的变化会影响您复制问题的能力。有些问题只会出现在单核 CPU 上，其他问题只会出现在多核上。

One last thing, try to use concurrency objects distributed with the system libraries - e.g in Java java.util.concurrentis your friend. Writing your own concurrency control objects is hard and fraught with danger; leave it to the experts, if you have a choice.

最后一件事，尝试使用随系统库分发的并发对象 - 例如在 Java 中java.util.concurrent是您的朋友。编写自己的并发控制对象既困难又充满危险；如果您有选择，请交给专家。

Answer 2

回答by Greg Mattes

I thought that the answeryou got to your other questionwas pretty good. But I'll emphasis these points.

我认为答案你有你的另一个问题是相当不错的。但我会强调这些要点。

Only modify shared state in a critical section (Mutual Exclusion)

只修改临界区的共享状态（互斥）

Acquire locks in a set order and release them in the opposite order.

以设定的顺序获取锁并以相反的顺序释放它们。

Use pre-built abstractions whenever possible(Like the stuff in java.util.concurrent)

尽可能使用预先构建的抽象（如 java.util.concurrent 中的内容）

Also, some analysis tools can detect some potential issues. For example, FindBugscan find some threading issues in Java programs. Such tools can't find all problems (they aren't silver bullets) but they can help.

此外，一些分析工具可以检测到一些潜在的问题。例如，FindBugs可以发现 Java 程序中的一些线程问题。这些工具不能找到所有问题（它们不是灵丹妙药），但它们可以提供帮助。

As vansllypoints out in a comment to this answer, studying well placed logging output can also very helpful, but beware of Heisenbugs.

正如vanslly在对此答案的评论中指出的那样，研究放置良好的日志输出也非常有帮助，但要小心Heisenbugs。

Answer 3

回答by krosenvold

Assuming I have reports of troubles that are hard to reproduce I always find these by reading code, preferably pair-code-reading, so you can discuss threading semantics/locking needs. When we do this based on a reported problem, I find we always nail one or more problems fairly quickly. I think it's also a fairly cheap technique to solve hard problems.

假设我有难以重现的问题报告，我总是通过阅读代码找到这些问题，最好是对代码阅读，这样你就可以讨论线程语义/锁定需求。当我们根据报告的问题执行此操作时，我发现我们总是很快地解决一个或多个问题。我认为这也是解决难题的一种相当便宜的技术。

Sorry for not being able to tell you to press ctrl+shift+f13, but I don't think there's anything like that available. But just thinking about whatthe reported issue actually isusually gives a fairly strong sense of direction in the code, so you don't have to start at main().

很抱歉无法告诉您按 ctrl+shift+f13，但我认为没有类似的功能。但是仅仅考虑报告的问题实际上是什么通常会在代码中给出相当强烈的方向感，因此您不必从 main() 开始。

Answer 4

回答by mghie

In addition to the other good answers you already got: Always test on a machine with at least as many processors / processor cores as the customer uses, or as there are active threads in your program. Otherwise some multithreading bugs may be hard to impossible to reproduce.

除了您已经得到的其他好答案之外：始终在具有至少与客户使用的处理器/处理器内核一样多的机器上进行测试，或者因为您的程序中有活动线程。否则一些多线程错误可能很难甚至不可能重现。

Answer 5

回答by ChrisW

Apart from crash dumps, a technique is extensive run-time logging: where each thread logs what it's doing.

除了故障转储之外，还有一种技术是广泛的运行时日志记录：每个线程记录它正在做的事情。

The first question when an error is reported, then, might be, "Where's the log file?"

报告错误时的第一个问题可能是“日志文件在哪里？”

Sometimes you can see the problem in the log file: "This thread is detecting an illegal/unexpected state here ... and look, this other thread was doing that, just before and/or just afterwards this."

有时您可以在日志文件中看到问题：“这个线程在这里检测到一个非法/意外状态......看，这个另一个线程正在这样做，就在此之前和/或之后。”

If the log file doesn't say what's happening, then apologise to the customer, add sufficiently-many extra logging statements to the code, give the new code to the customer, and say that you'll fix it after it happens one more time.

如果日志文件没有说明发生了什么，那么向客户道歉，在代码中添加足够多的额外日志语句，将新代码提供给客户，并说你会在它再次发生后修复它.

Answer 6

回答by bLaXHyman

For Java there is a verification tool called javapathfinderwhich I find it useful to debug and verify multi-threading application against potential race condition and death-lock bugs from the code.
It works finely with both Eclipse and Netbean IDE.

对于 Java，有一个名为javapathfinder的验证工具，我发现它可用于调试和验证多线程应用程序是否存在潜在的竞争条件和代码中的死锁错误。
它适用于 Eclipse 和 Netbean IDE。

[2019] the github repository https://github.com/javapathfinder

[2019] github 仓库 https://github.com/javapathfinder

Answer 7

回答by Peter Huber

Sometimes, multithreaded solutions cannot be avoided. If there is a bug,it needs to be investigated in real time, which is nearly impossible with most tools like Visual Studio. The only practical solution is to write traces, although the tracing itself should:

有时，无法避免多线程解决方案。如果有错误，需要实时调查，这对于大多数工具（如 Visual Studio）几乎是不可能的。唯一实用的解决方案是编写跟踪，尽管跟踪本身应该：

not add any delay
not use any locking
be multithreading safe
trace what happened in the correct sequence.

不添加任何延迟
不使用任何锁定
多线程安全
以正确的顺序跟踪发生的事情。

This sounds like an impossible task, but it can be easily achieved by writing the trace into memory. In C#, it would look something like this:

这听起来像是一项不可能完成的任务，但可以通过将跟踪写入内存来轻松实现。在 C# 中，它看起来像这样：

public const int MaxMessages = 0x100;
string[] messages = new string[MaxMessages];
int messagesIndex = -1;

public void Trace(string message) {
  int thisIndex = Interlocked.Increment(ref messagesIndex);
  messages[thisIndex] = message;
}

The method Trace() is multithreading safe, non blocking and can be called from any thread. On my PC, it takes about 2 microseconds to execute, which should be fast enough.

方法 Trace() 是多线程安全的，非阻塞的，可以从任何线程调用。在我的 PC 上，执行大约需要 2 微秒，这应该足够快了。

Add Trace() instructions wherever you think something might go wrong, let the program run, wait until the error happens, stop the trace and then investigate the trace for any errors.

在您认为可能出错的地方添加 Trace() 指令，让程序运行，等到错误发生，停止跟踪，然后调查跟踪是否有任何错误。

A more detailed description for this approach which also collects thread and timing information, recycles the buffer and outputs the trace nicely you can find at: CodeProject: Debugging multithreaded code in real time 1

有关此方法的更详细说明，该方法还收集线程和计时信息、回收缓冲区并很好地输出跟踪，您可以在以下位置找到：CodeProject：实时调试多线程代码1

Answer 8

回答by Mouze

A little chart with some debugging techniques to take in mind in debugging multithreaded code. The chart is growing, please leave comments and tips to be added. (update file at this link)

在调试多线程代码时需要考虑的一些调试技术的小图表。图表正在增长，请留下评论和提示以添加。（在此链接更新文件）

Multithreaded debugging chart

多线程调试图

Answer 9

回答by zvrba

assert() is your friend for detecting race-conditions. Whenever you enter a critical section, assert that the invariant associated with it is true (that's what CS's are for). Though, unfortunately, the check might be expensive and thus not suitable for use in production environment.

assert() 是您检测竞争条件的朋友。每当您进入临界区时，请断言与其关联的不变量为真（这就是 CS 的用途）。但是，不幸的是，检查可能很昂贵，因此不适合在生产环境中使用。

Answer 10

回答by Thomas Krieger

I implemented the tool vmlensto detect race conditions in java programs during runtime. It implements an algorithm called eraser.

我实现了vmlens工具来在运行时检测 Java 程序中的竞争条件。它实现了一种称为擦除器的算法。

multithreading 如何检测和调试多线程问题？

提问by MicSim

回答by Lawrence Dol

回答by Greg Mattes

回答by krosenvold

回答by mghie

回答by ChrisW

回答by bLaXHyman

回答by Peter Huber

回答by Mouze

回答by zvrba

回答by Thomas Krieger

相关推荐

最近更新

标签

multithreading 如何检测和调试多线程问题？

提问by MicSim

回答by Lawrence Dol

回答by Greg Mattes

回答by krosenvold

回答by mghie

回答by ChrisW

回答by bLaXHyman

回答by Peter Huber

回答by Mouze

回答by zvrba

回答by Thomas Krieger

相关推荐

bash 如何以编程方式检测 docker run 是否成功？

bash 以秒为单位获取 UTC 时间

bash 在 awk 中用换行符替换 \n

以 bash 中的错误消息退出（oneline）

相关推荐

最近更新

标签