Python 的 multiprocessing.Pool.map 中的“chunksize”参数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3822512/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
"chunksize" parameter in Python's multiprocessing.Pool.map
提问by sergio
If I have a pool object with 2 processors for example:
例如,如果我有一个带有 2 个处理器的池对象:
p=multiprocessing.Pool(2)
and I want to iterate over a list of files on directory and use the map function
我想遍历目录中的文件列表并使用 map 函数
could someone explain what is the chunksize of this function:
有人可以解释一下这个函数的块大小是多少:
p.map(func, iterable[, chunksize])
If I set the chunksize for example to 10 does that means every 10 files should be processed with one processor?
例如,如果我将块大小设置为 10,这是否意味着每 10 个文件应该用一个处理器处理?
回答by detly
Looking at the documentation for Pool.mapit seems you're almost correct: the chunksizeparameter will cause the iterable to be split into pieces of approximatelythat size, and each piece is submitted as a separate task.
查看 Pool.map的文档,您似乎几乎是正确的:该chunksize参数将导致可迭代对象被拆分为大约该大小的部分,并且每个部分都作为单独的任务提交。
So in your example, yes, mapwill take the first 10 (approximately), submit it as a task for a single processor... then the next 10 will be submitted as another task, and so on. Note that it doesn't mean that this will make the processors alternate every 10 files, it's quite possible that processor #1 ends up getting 1-10 AND 11-20, and processor #2 gets 21-30 and 31-40.
因此,在您的示例中,是的,map将采用前 10 个(大约),将其作为单个处理器的任务提交......然后接下来的 10 个将作为另一个任务提交,依此类推。请注意,这并不意味着这将使处理器每 10 个文件交替一次,很可能处理器 #1 最终得到 1-10 和 11-20,而处理器 #2 得到 21-30 和 31-40。

