Python 中的 multiprocessing.dummy 未使用 100% cpu

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/26432411/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 00:27:57  来源:igfitidea点击:

multiprocessing.dummy in Python is not utilising 100% cpu

pythonparallel-processingmultiprocessing

提问by Demyanov

I am doing a machine learning project in Python, so I have to do parallel predict function, which I'm using in my program.

我正在用 Python 做一个机器学习项目,所以我必须做并行预测函数,我在我的程序中使用它。

from multiprocessing.dummy import Pool
from multiprocessing import cpu_count


def multi_predict(X, predict, *args, **kwargs):
    pool = Pool(cpu_count())
    results = pool.map(predict, X)
    pool.close()
    pool.join()
    return results

The problem is that all my CPUs loaded only on 20-40% (in sum it's 100%). I use multiprocessing.dummy because I have some problems with multiprocessing module in pickling function.

问题是我所有的 CPU 只加载了 20-40%(总而言之是 100%)。我使用 multiprocessing.dummy 是因为酸洗功能中的 multiprocessing 模块存在一些问题。

采纳答案by dano

When you use multiprocessing.dummy, you're using threads, not processes:

当您使用 时multiprocessing.dummy,您使用的是线程,而不是进程:

multiprocessing.dummyreplicates the API of multiprocessingbut is no more than a wrapper around the threadingmodule.

multiprocessing.dummy复制的 APImultiprocessing但只不过是threading模块的包装器。

That means you're restricted by the Global Interpreter Lock (GIL), and only one thread can actually execute CPU-bound operations at a time. That's going to keep you from fully utilizing your CPUs. If you want get full parallelism across all available cores, you're going to need to address the pickling issue you're hitting with multiprocessing.Pool.

这意味着您受到Global Interpreter Lock (GIL) 的限制,并且一次实际上只有一个线程可以执行 CPU 密集型操作。这将阻止您充分利用您的 CPU。如果您想在所有可用内核之间获得完全并行性,您将需要解决您遇到的酸洗问题multiprocessing.Pool

Note that multiprocessing.dummymight still be useful if the work you need to parallelize is IO bound, or utilizes a C-extension that releases the GIL. For pure Python code, however, you'll need multiprocessing.

请注意,multiprocessing.dummy如果您需要并行化的工作是 IO 绑定的,或者使用释放 GIL 的 C 扩展,这可能仍然有用。但是,对于纯 Python 代码,您需要multiprocessing.