Python 我们什么时候应该调用 multiprocessing.Pool.join?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/38271547/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 20:35:26  来源:igfitidea点击:

When should we call multiprocessing.Pool.join?

pythonpython-multiprocessing

提问by hch

I am using 'multiprocess.Pool.imap_unordered' as following

我正在使用“multiprocess.Pool.imap_unordered”如下

from multiprocessing import Pool
pool = Pool()
for mapped_result in pool.imap_unordered(mapping_func, args_iter):
    do some additional processing on mapped_result

Do I need to call pool.closeor pool.joinafter the for loop?

我是否需要调用pool.closepool.join在 for 循环之后?

回答by Bamcclur

No, you don't, but it's probably a good idea if you aren't going to use the pool anymore.

不,您不会,但如果您不再打算使用游泳池,这可能是个好主意。

Reasons for calling pool.closeor pool.joinare well said by Tim Peters in this SO post:

打电话的原因pool.closepool.joinTim Peters 在这篇 SO 帖子中说得很好:

As to Pool.close(), you should call that when - and only when - you're never going to submit more work to the Pool instance. So Pool.close() is typically called when the parallelizable part of your main program is finished. Then the worker processes will terminate when all work already assigned has completed.

It's also excellent practice to call Pool.join() to wait for the worker processes to terminate. Among other reasons, there's often no good way to report exceptions in parallelized code (exceptions occur in a context only vaguely related to what your main program is doing), and Pool.join() provides a synchronization point that can report some exceptions that occurred in worker processes that you'd otherwise never see.

至于 Pool.close(),您应该在 - 并且仅当 - 您永远不会向 Pool 实例提交更多工作时调用它。所以 Pool.close() 通常在主程序的可并行部分完成时调用。然后,当所有已分配的工作完成时,工作进程将终止。

调用 Pool.join() 等待工作进程终止也是一种很好的做法。除其他原因外,在并行代码中报告异常通常没有好的方法(异常发生在与主程序正在执行的操作仅模糊相关的上下文中),并且 Pool.join() 提供了一个同步点,可以报告一些发生的异常在你永远不会看到的工作进程中。

回答by Odysseus Ithaca

I had the same memory issue as Memory usage keep growing with Python's multiprocessing.poolwhen I didn't use pool.close()and pool.join()when using pool.map()with a function that calculated Levenshtein distance. The function worked fine, but wasn't garbage collected properly on a Win7 64 machine, and the memory usage kept growing out of control every time the function was called until it took the whole operating system down. Here's the code that fixed the leak:

当我不使用以及使用计算 Levenshtein 距离的函数时,我遇到了相同的内存问题,因为内存使用随着 Python 的 multiprocessing.pool 不断增长。该函数运行良好,但没有在 Win7 64 机器上正确收集垃圾,并且每次调用该函数时内存使用量不断增长,直到整个操作系统关闭。这是修复泄漏的代码:pool.close()pool.join()pool.map()

stringList = []
for possible_string in stringArray:
    stringList.append((searchString,possible_string))

pool = Pool(5)
results = pool.map(myLevenshteinFunction, stringList)
pool.close()
pool.join()

After closing and joining the pool the memory leak went away.

关闭并加入池后,内存泄漏消失了。