Python multiprocessing.Pool:map_async 和 imap 有什么区别?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/26520781/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 00:35:20  来源:igfitidea点击:

multiprocessing.Pool: What's the difference between map_async and imap?

pythonmultiprocessingpython-multiprocessing

提问by spacegoing

I'm trying to learn how to use Python's multiprocessingpackage, but I don't understand the difference between map_asyncand imap. I noticed that both map_asyncand imapare executed asynchronously. So when should I use one over the other? And how should I retrieve the result returned by map_async?

我想学习如何使用Python的multiprocessing包,但我不明白之间的差别map_asyncimap。我注意到,这两个map_asyncimap是异步执行的。那么我什么时候应该使用一个?我应该如何检索返回的结果map_async

Should I use something like this?

我应该使用这样的东西吗?

def test():
    result = pool.map_async()
    pool.close()
    pool.join()
    return result.get()

result=test()
for i in result:
    print i

采纳答案by dano

There are two key differences between imap/imap_unorderedand map/map_async:

imap/imap_unorderedmap/之间有两个主要区别map_async

  1. The way they consume the iterable you pass to them.
  2. The way they return the result back to you.
  1. 他们消耗你传递给他们的迭代的方式。
  2. 他们将结果返回给您的方式。

mapconsumes your iterable by converting the iterable to a list (assuming it isn't a list already), breaking it into chunks, and sending those chunks to the worker processes in the Pool. Breaking the iterable into chunks performs better than passing each item in the iterable between processes one item at a time - particularly if the iterable is large. However, turning the iterable into a list in order to chunk it can have a very high memory cost, since the entire list will need to be kept in memory.

map通过将可迭代对象转换为列表(假设它已经不是列表)、将其分成块,然后将这些块发送到Pool. 将可迭代对象分成块比在进程之间一次传递一个可迭代对象中的每个项目的效果更好——尤其是在可迭代对象很大的情况下。但是,将可迭代对象转换为列表以对其进行分块可能具有非常高的内存成本,因为需要将整个列表保存在内存中。

imapdoesn't turn the iterable you give it into a list, nor does break it into chunks (by default). It will iterate over the iterable one element at a time, and send them each to a worker process. This means you don't take the memory hit of converting the whole iterable to a list, but it also means the performance is slower for large iterables, because of the lack of chunking. This can be mitigated by passing a chunksizeargument larger than default of 1, however.

imap不会将您提供的可迭代对象转换为列表,也不会将其分解为块(默认情况下)。它将一次迭代一个可迭代元素,并将它们每个发送到一个工作进程。这意味着您不会因为将整个可迭代对象转换为列表而占用内存,但这也意味着大型可迭代对象的性能较慢,因为缺少分块。但是,这可以通过传递chunksize大于默认值 1的参数来缓解。

The other major difference between imap/imap_unorderedand map/map_async, is that with imap/imap_unordered, you can start receiving results from workers as soon as they're ready, rather than having to wait for all of them to be finished. With map_async, an AsyncResultis returned right away, but you can't actually retrieve results from that object until all of them have been processed, at which points it returns the same list that mapdoes (mapis actually implemented internally as map_async(...).get()). There's no way to get partial results; you either have the entire result, or nothing.

imap/imap_unorderedmap/之间的另一个主要区别map_async是,使用imap/ imap_unordered,您可以在工作人员准备好后立即开始接收他们的结果,而不必等待所有工作完成。使用map_async, anAsyncResult会立即返回,但在所有结果都被处理之前,您实际上无法从该对象检索结果,此时它返回与执行相同的列表mapmap实际上在内部实现为map_async(...).get())。无法获得部分结果;你要么有完整的结果,要么什么都没有。

imapand imap_unorderedboth return iterables right away. With imap, the results will be yielded from the iterable as soon as they're ready, while still preserving the ordering of the input iterable. With imap_unordered, results will be yielded as soon as they're ready, regardless of the order of the input iterable. So, say you have this:

imap并且imap_unordered都立即返回可迭代对象。使用imap,结果将在准备好后立即从可迭代对象中产生,同时仍保留输入可迭代对象的顺序。使用imap_unordered,结果将在准备好后立即产生,而不管输入可迭代的顺序如何。所以,假设你有这个:

import multiprocessing
import time

def func(x):
    time.sleep(x)
    return x + 2

if __name__ == "__main__":    
    p = multiprocessing.Pool()
    start = time.time()
    for x in p.imap(func, [1,5,3]):
        print("{} (Time elapsed: {}s)".format(x, int(time.time() - start)))

This will output:

这将输出:

3 (Time elapsed: 1s)
7 (Time elapsed: 5s)
5 (Time elapsed: 5s)

If you use p.imap_unorderedinstead of p.imap, you'll see:

如果你使用p.imap_unordered而不是p.imap,你会看到:

3 (Time elapsed: 1s)
5 (Time elapsed: 3s)
7 (Time elapsed: 5s)

If you use p.mapor p.map_async().get(), you'll see:

如果您使用p.mapp.map_async().get(),您将看到:

3 (Time elapsed: 5s)
7 (Time elapsed: 5s)
5 (Time elapsed: 5s)

So, the primary reasons to use imap/imap_unorderedover map_asyncare:

因此,使用imap/ imap_unorderedover 的主要原因map_async是:

  1. Your iterable is large enough that converting it to a list would cause you to run out of/use too much memory.
  2. You want to be able to start processing the results before allof them are completed.
  1. 您的可迭代对象足够大,将其转换为列表会导致您耗尽/使用过多内存。
  2. 您希望能够在所有结果完成之前开始处理结果。