Python 我们什么时候应该调用 multiprocessing.Pool.join？

Question

提问by hch

I am using 'multiprocess.Pool.imap_unordered' as following

我正在使用“multiprocess.Pool.imap_unordered”如下

from multiprocessing import Pool
pool = Pool()
for mapped_result in pool.imap_unordered(mapping_func, args_iter):
    do some additional processing on mapped_result

Do I need to call pool.closeor pool.joinafter the for loop?

我是否需要调用pool.close或pool.join在 for 循环之后？

Answer 1

回答by Bamcclur

No, you don't, but it's probably a good idea if you aren't going to use the pool anymore.

不，您不会，但如果您不再打算使用游泳池，这可能是个好主意。

Reasons for calling pool.closeor pool.joinare well said by Tim Peters in this SO post:

打电话的原因pool.close或pool.joinTim Peters 在这篇 SO 帖子中说得很好：

As to Pool.close(), you should call that when - and only when - you're never going to submit more work to the Pool instance. So Pool.close() is typically called when the parallelizable part of your main program is finished. Then the worker processes will terminate when all work already assigned has completed.
It's also excellent practice to call Pool.join() to wait for the worker processes to terminate. Among other reasons, there's often no good way to report exceptions in parallelized code (exceptions occur in a context only vaguely related to what your main program is doing), and Pool.join() provides a synchronization point that can report some exceptions that occurred in worker processes that you'd otherwise never see.

至于 Pool.close()，您应该在 - 并且仅当 - 您永远不会向 Pool 实例提交更多工作时调用它。所以 Pool.close() 通常在主程序的可并行部分完成时调用。然后，当所有已分配的工作完成时，工作进程将终止。
调用 Pool.join() 等待工作进程终止也是一种很好的做法。除其他原因外，在并行代码中报告异常通常没有好的方法（异常发生在与主程序正在执行的操作仅模糊相关的上下文中），并且 Pool.join() 提供了一个同步点，可以报告一些发生的异常在你永远不会看到的工作进程中。

Answer 2

回答by Odysseus Ithaca

I had the same memory issue as Memory usage keep growing with Python's multiprocessing.poolwhen I didn't use pool.close()and pool.join()when using pool.map()with a function that calculated Levenshtein distance. The function worked fine, but wasn't garbage collected properly on a Win7 64 machine, and the memory usage kept growing out of control every time the function was called until it took the whole operating system down. Here's the code that fixed the leak:

当我不使用以及使用计算 Levenshtein 距离的函数时，我遇到了相同的内存问题，因为内存使用随着 Python 的 multiprocessing.pool 不断增长。该函数运行良好，但没有在 Win7 64 机器上正确收集垃圾，并且每次调用该函数时内存使用量不断增长，直到整个操作系统关闭。这是修复泄漏的代码：pool.close()pool.join()pool.map()

stringList = []
for possible_string in stringArray:
    stringList.append((searchString,possible_string))

pool = Pool(5)
results = pool.map(myLevenshteinFunction, stringList)
pool.close()
pool.join()

After closing and joining the pool the memory leak went away.

关闭并加入池后，内存泄漏消失了。

Python 我们什么时候应该调用 multiprocessing.Pool.join？

提问by hch

回答by Bamcclur

回答by Odysseus Ithaca

相关推荐

最近更新

标签

Python 我们什么时候应该调用 multiprocessing.Pool.join？

提问by hch

回答by Bamcclur

回答by Odysseus Ithaca

相关推荐

在 2020 年 1 月 1 日生命周期结束后，PIP 是否适用于 python 2.7

Python jupyter 中没有名为 tensorflow 的模块

如何在 Python 3.7 上安装 PyAudio？

Python 如何避免 AttributeError: '_tkinter.tkapp' 对象没有属性 'PassCheck'

相关推荐

最近更新

标签