Java ThreadPoolExecutor 中的死锁

Question

提问by Vitaly

Encountered a situation when ThreadPoolExecutoris parked in execute(Runnable)function while all the ThreadPoolthreads are waiting in getTaskfunc, workQueue is empty.

遇到了一种情况，在所有线程都在func中等待，而 workQueue 为空的情况下，ThreadPoolExecutor停在execute(Runnable)函数中。ThreadPoolgetTask

Does anybody have any ideas?

有人有任何想法吗？

The ThreadPoolExecutoris created with ArrayBlockingQueue, and corePoolSize == maximumPoolSize = 4

在ThreadPoolExecutor与创建ArrayBlockingQueue，和corePoolSize == maximumPoolSize = 4

[Edit] To be more precise, the thread is blocked in ThreadPoolExecutor.exec(Runnable command)func. It has the task to execute, but doesn't do it.

[编辑] 更准确地说，线程在ThreadPoolExecutor.exec(Runnable command)func 中被阻塞。它有要执行的任务，但没有执行。

[Edit2] The executor is blocked somewhere inside the working queue (ArrayBlockingQueue).

[Edit2] 执行器被阻塞在工作队列 ( ArrayBlockingQueue)内的某处。

[Edit3] The callstack:

[Edit3] 调用堆栈：

thread = front_end(224)
at sun.misc.Unsafe.park(Native methord)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:778)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1114)
at
java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186)
at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262)
at java.util.concurrent.ArrayBlockingQueue.offer(ArrayBlockingQueue.java:224)
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:653)
at net.listenThread.WorkersPool.execute(WorkersPool.java:45)

at the same time the workQueue is empty (checked using remote debug)

同时工作队列为空（使用远程调试检查）

[Edit4] Code working with ThreadPoolExecutor:

[Edit4] 使用的代码ThreadPoolExecutor：

public WorkersPool(int size) {
  pool = new ThreadPoolExecutor(size, size, IDLE_WORKER_THREAD_TIMEOUT, TimeUnit.SECONDS, new ArrayBlockingQueue<Runnable>(WORK_QUEUE_CAPACITY),
      new ThreadFactory() {
        @NotNull
        private final AtomicInteger threadsCount = new AtomicInteger(0);

        @NotNull
        public Thread newThread(@NotNull Runnable r) {
          final Thread thread = new Thread(r);
          thread.setName("net_worker_" + threadsCount.incrementAndGet());
          return thread;
        }
      },

      new RejectedExecutionHandler() {
        public void rejectedExecution(@Nullable Runnable r, @Nullable ThreadPoolExecutor executor) {
          Verify.warning("new task " + r + " is discarded");
        }
      });
  }

  public void execute(@NotNull Runnable task) {
    pool.execute(task);
  }

  public void stopWorkers() throws WorkersTerminationFailedException {
    pool.shutdownNow();
    try {
      pool.awaitTermination(THREAD_TERMINATION_WAIT_TIME, TimeUnit.SECONDS);
    } catch (InterruptedException e) {
      throw new WorkersTerminationFailedException("Workers-pool termination failed", e);
    }
  }
}

Answer 1

回答by akf

I don't see any locking in the code of ThreadPoolExecutor's execute(Runnable). The only variable there is the workQueue. What sort of BlockingQueuedid you provide to your ThreadPoolExecutor?

我在ThreadPoolExecutor's的代码中没有看到任何锁定execute(Runnable)。唯一的变量是workQueue. BlockingQueue你提供给你的是什么样的ThreadPoolExecutor？

On the topic of deadlocks:

关于死锁的话题：

You can confirm this is a deadlock by examining the Full Thread Dump, as provided by <ctrl><break>on Windows or kill -QUITon UNIX systems.

您可以通过检查<ctrl><break>Windows 或kill -QUITUNIX 系统上提供的完整线程转储来确认这是一个死锁。

Once you have that data, you can examine the threads. Here is a pertinent excerpt from Sun's article on examining thread dumps (suggested reading):

获得这些数据后，您可以检查线程。以下是Sun 关于检查线程转储的文章的相关摘录（建议阅读）：

For hanging, deadlocked or frozen programs: If you think your program is hanging, generate a stack trace and examine the threads in states MW or CW. If the program is deadlocked then some of the system threads will probably show up as the current threads, because there is nothing else for the JVM to do.

对于挂起、死锁或冻结的程序：如果您认为您的程序正在挂起，请生成堆栈跟踪并检查处于 MW 或 CW 状态的线程。如果程序死锁，那么一些系统线程可能会显示为当前线程，因为 JVM 无事可做。

On a lighter note: if you are running in an IDE, can you ensure that there are no breakpoints enabled in these methods.

稍微说明一下：如果您在 IDE 中运行，您能否确保在这些方法中没有启用断点。

Answer 2

回答by Jamie McCrindle

As someone already mentioned, this sounds like normal behaviour, the ThreadPoolExecutor is just waiting to do some work. If you want to stop it, you need to call:

正如有人已经提到的，这听起来很正常，ThreadPoolExecutor 只是在等待做一些工作。如果你想阻止它，你需要调用：

executor.shutdown()

to get it to terminate, usually followed by a executor.awaitTermination

让它终止，通常后跟一个 executor.awaitTermination

Answer 3

回答by Andriy

The library code source is below (that's in fact a class from http://spymemcached.googlecode.com/files/memcached-2.4.2-sources.zip),
- a bit complicated - added protection against repeated calls of FutureTask if I'm not mistaken - but doesn't seem like deadlock prone - very simple ThreadPool usage:

库代码源如下（这实际上是来自http://spymemcached.googlecode.com/files/memcached-2.4.2-sources.zip 的一个类），
- 有点复杂 - 增加了防止重复调用 FutureTask 的保护，如果我'我没弄错 - 但似乎不太容易死锁 - 非常简单的 ThreadPool 用法：

package net.spy.memcached.transcoders;

import java.util.concurrent.ArrayBlockingQueue;
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.Future;
import java.util.concurrent.FutureTask;
import java.util.concurrent.ThreadPoolExecutor;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.TimeoutException;
import java.util.concurrent.atomic.AtomicBoolean;

import net.spy.memcached.CachedData;
import net.spy.memcached.compat.SpyObject;

/**
 * Asynchronous transcoder.
 */
public class TranscodeService extends SpyObject {

    private final ThreadPoolExecutor pool = new ThreadPoolExecutor(1, 10, 60L,
            TimeUnit.MILLISECONDS, new ArrayBlockingQueue<Runnable>(100),
            new ThreadPoolExecutor.DiscardPolicy());

    /**
     * Perform a decode.
     */
    public <T> Future<T> decode(final Transcoder<T> tc,
            final CachedData cachedData) {

        assert !pool.isShutdown() : "Pool has already shut down.";

        TranscodeService.Task<T> task = new TranscodeService.Task<T>(
                new Callable<T>() {
                    public T call() {
                        return tc.decode(cachedData);
                    }
                });

        if (tc.asyncDecode(cachedData)) {
            this.pool.execute(task);
        }
        return task;
    }

    /**
     * Shut down the pool.
     */
    public void shutdown() {
        pool.shutdown();
    }

    /**
     * Ask whether this service has been shut down.
     */
    public boolean isShutdown() {
        return pool.isShutdown();
    }

    private static class Task<T> extends FutureTask<T> {
        private final AtomicBoolean isRunning = new AtomicBoolean(false);

        public Task(Callable<T> callable) {
            super(callable);
        }

        @Override
        public T get() throws InterruptedException, ExecutionException {
            this.run();
            return super.get();
        }

        @Override
        public T get(long timeout, TimeUnit unit) throws InterruptedException,
                ExecutionException, TimeoutException {
            this.run();
            return super.get(timeout, unit);
        }

        @Override
        public void run() {
            if (this.isRunning.compareAndSet(false, true)) {
                super.run();
            }
        }
    }

}

Answer 4

回答by artemv

Definitely strange.

肯定很奇怪。

But before writing your own TPE try:

但在编写自己的 TPE 之前，请尝试：

another BlockingQueueimpl., e.g. LinkedBlockingQueue
specify fairness=true in ArrayBlockingQueue, i.e. use new ArrayBlockingQueue(n, true)

另一个BlockingQueue含义，例如LinkedBlockingQueue
在 ArrayBlockingQueue 中指定 fairness=true，即使用 new ArrayBlockingQueue(n, true)

From those two opts I would chose second one 'cause it's very strange that offer()being blocked; one reason that comes into mind - thread scheduling policy on your Linux. Just as an assumption.

在这两个选项中，我会选择第二个，offer()因为被阻止很奇怪；想到的一个原因 - Linux 上的线程调度策略。就像一个假设。

Answer 5

回答by nicholas.hauschild

It sounds like it is a bug with an JVM's older than 6u21. There was an issue in the compiled native code for some (maybe all) OS's.

这听起来像是 JVM 早于 6u21 的错误。某些（也许所有）操作系统的编译本机代码存在问题。

From the link:

从链接：

The bug is caused by missing memory barriers in various Parker::park() paths that can result in lost wakeups and hangs. (Note that PlatformEvent::park used by built-in synchronization is not vulnerable to the issue). -XX:+UseMembar constitues a work-around because the membar barrier in the state transition logic hides the problem in Parker::. (that is, there's nothing wrong with the use -UseMembar mechanism, but +UseMembar hides the bug Parker::). This is a day-one bug introduced with the addition of java.util.concurrent in JDK 5.0. I developed a simple C mode of the failure and it seems more likely to manifest on modern AMD and Nehalem platforms, likely because of deeper store buffers that take longer to drain. I provided a tentative fix to Doug Lea for Parker::park which appears to eliminate the bug. I'll be delivering this fix to runtime. (I'll also augment the CR with additional test cases and and a longer explanation). This is likely a good candidate for back-ports.

该错误是由各种 Parker::park() 路径中缺少内存屏障引起的，这可能导致丢失唤醒和挂起。（请注意，内置同步使用的 PlatformEvent::park 不易受到此问题的影响）。-XX:+UseMembar 构成一种变通方法，因为状态转换逻辑中的 membar 屏障隐藏了 Parker:: 中的问题。（也就是说，使用 -UseMembar 机制没有任何问题，但 +UseMembar 隐藏了错误 Parker::）。这是在 JDK 5.0 中添加 java.util.concurrent 时引入的第一天错误。我开发了一个简单的失败的 C 模式，它似乎更可能在现代 AMD 和 Nehalem 平台上表现出来，可能是因为更深的存储缓冲区需要更长的时间来耗尽。我为 Parker::park 提供了 Doug Lea 的临时修复，这似乎消除了该错误。一世' 将向运行时提供此修复程序。（我还将用额外的测试用例和更长的解释来扩充 CR）。这可能是反向移植的一个很好的候选者。

Link: JVM Bug

链接：JVM 错误

Workarounds are available, but you would probably be best off just getting the most recent copy of Java.

解决方法是可用的，但您最好只获取最新的 Java 副本。

Answer 6

回答by Alexey Efimov

This deadlock probably because you run task from executor itself. For example, you submit one task, and this one fires another 4 tasks. If you have pool size equals to 4, then you just totally overflow it and last task will wait until someone of task return value. But the first task wait for all forked tasks to be completed.

这种死锁可能是因为您从执行程序本身运行任务。例如，您提交一个任务，而这个任务又触发了另外 4 个任务。如果池大小等于 4，那么您只是完全溢出它，最后一个任务将等到任务返回值的某个人。但是第一个任务等待所有分叉任务完成。

Java ThreadPoolExecutor 中的死锁

提问by Vitaly

回答by akf

回答by Jamie McCrindle

回答by Andriy

回答by artemv

回答by nicholas.hauschild

回答by Alexey Efimov

相关推荐

最近更新

标签

Java ThreadPoolExecutor 中的死锁

提问by Vitaly

回答by akf

回答by Jamie McCrindle

回答by Andriy

回答by artemv

回答by nicholas.hauschild

回答by Alexey Efimov

相关推荐

Java正则表达式仅第一次匹配

Java 中的实时绘图

JAVA - 最高和最低数字

Java Drools 运行时错误

相关推荐

最近更新

标签