用于 Python 的 Parfor
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4682429/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Parfor for Python
提问by Dat Chu
I am looking for a definitive answer to MATLAB's parfor for Python (Scipy, Numpy).
我正在寻找 MATLAB 的 Python parfor (Scipy, Numpy) 的明确答案。
Is there a solution similar to parfor? If not, what is the complication for creating one?
有没有类似于parfor的解决方案?如果不是,创建一个的复杂性是什么?
UPDATE: Here is a typical numerical computation code that I need speeding up
更新:这是我需要加速的典型数值计算代码
import numpy as np
N = 2000
output = np.zeros([N,N])
for i in range(N):
for j in range(N):
output[i,j] = HeavyComputationThatIsThreadSafe(i,j)
An example of a heavy computation function is:
一个繁重计算函数的例子是:
import scipy.optimize
def HeavyComputationThatIsThreadSafe(i,j):
n = i * j
return scipy.optimize.anneal(lambda x: np.sum((x-np.arange(n)**2)), np.random.random((n,1)))[0][0,0]
采纳答案by Sven Marnach
There are many Python frameworks for parallel computing. The one I happen to like most is IPython, but I don't know too much about any of the others. In IPython, one analogue to parfor would be client.MultiEngineClient.map()or some of the other constructs in the documentation on quick and easy parallelism.
有许多用于并行计算的 Python 框架。我碰巧最喜欢的是IPython,但我对其他任何一个都不太了解。在 IPython 中,parfor 的一个类似物将是文档中关于快速和简单并行client.MultiEngineClient.map()性的其他一些结构。
回答by David Heffernan
I've always used Parallel Pythonbut it's not a complete analog since I believe it typically uses separate processes which can be expensive on certain operating systems. Still, if the body of your loops are chunky enough then this won't matter and can actually have some benefits.
我一直使用Parallel Python,但它不是一个完整的模拟,因为我相信它通常使用单独的进程,这在某些操作系统上可能很昂贵。尽管如此,如果您的循环体足够粗,那么这无关紧要,实际上可以带来一些好处。
回答by JudoWill
The one built-in to python would be multiprocessingdocs are here. I always use multiprocessing.Poolwith as many workers as processors. Then whenever I need to do a for-loop like structure I use Pool.imap
python 内置的一个是multiprocessingdocs are here。我总是使用multiprocessing.Pool与处理器一样多的工人。然后每当我需要做一个类似 for 循环的结构时,我都会使用Pool.imap
As long as the body of your function does not depend on any previous iteration then you should have near linear speed-up. This also requires that your inputs and outputs are pickle-able but this is pretty easy to ensure for standard types.
只要你的函数体不依赖于任何先前的迭代,那么你应该有接近线性的加速。这也要求您的输入和输出是pickle-able ,但这对于标准类型很容易确保。
UPDATE: Some code for your updated function just to show how easy it is:
更新:更新函数的一些代码只是为了显示它是多么容易:
from multiprocessing import Pool
from itertools import product
output = np.zeros((N,N))
pool = Pool() #defaults to number of available CPU's
chunksize = 20 #this may take some guessing ... take a look at the docs to decide
for ind, res in enumerate(pool.imap(Fun, product(xrange(N), xrange(N))), chunksize):
output.flat[ind] = res
回答by rsc05
Jupyter Notebook
Jupyter 笔记本
To see an example consider you want to write the equivalence of this Matlab code on in Python
要查看示例,请考虑您想在 Python 中编写此 Matlab 代码的等效项
matlabpool open 4
parfor n=0:9
for i=1:10000
for j=1:10000
s=j*i
end
end
n
end
disp('done')
The way one may write this in python particularly in jupyter notebook. You have to create a function in the working directory (I called it FunForParFor.py) which has the following
人们可能会在 python 中编写它的方式,特别是在 jupyter notebook 中。您必须在工作目录中创建一个函数(我称之为 FunForParFor.py),它具有以下内容
def func(n):
for i in range(10000):
for j in range(10000):
s=j*i
print(n)
Then I go to my Jupyter notebook and write the following code
然后我去我的 Jupyter notebook 写下面的代码
import multiprocessing
import FunForParFor
if __name__ == '__main__':
pool = multiprocessing.Pool(processes=4)
pool.map(FunForParFor.func, range(10))
pool.close()
pool.join()
print('done')
This has worked for me! I just wanted to share it here to give you a particular example.
这对我有用!我只是想在这里分享给你一个特定的例子。
回答by Felix
I tried all solutions here, but found that the simplest way and closest equivalent to matlabs parfor is numba'sprange.
我在这里尝试了所有解决方案,但发现最简单的方法和最接近 matlabs parfor 的方法是numba 的prange。
Essentially you change a single letter in your loop, range to prange:
本质上,您更改循环中的单个字母,范围为 prange:
from numba import autojit, prange
@autojit
def parallel_sum(A):
sum = 0.0
for i in prange(A.shape[0]):
sum += A[i]
return sum
回答by Ion Stoica
This can be done elegantly with Ray, a system that allows you to easily parallelize and distribute your Python code.
这可以通过Ray优雅地完成,这是一个允许您轻松并行化和分发 Python 代码的系统。
To parallelize your example, you'd need to define your functions with the @ray.remotedecorator, and then invoke them with .remote.
要并行化您的示例,您需要使用@ray.remote装饰器定义您的函数,然后使用.remote.
import numpy as np
import time
import ray
ray.init()
# Define the function. Each remote function will be executed
# in a separate process.
@ray.remote
def HeavyComputationThatIsThreadSafe(i, j):
n = i*j
time.sleep(0.5) # Simulate some heavy computation.
return n
N = 10
output_ids = []
for i in range(N):
for j in range(N):
# Remote functions return a future, i.e, an identifier to the
# result, rather than the result itself. This allows invoking
# the next remote function before the previous finished, which
# leads to the remote functions being executed in parallel.
output_ids.append(HeavyComputationThatIsThreadSafe.remote(i,j))
# Get results when ready.
output_list = ray.get(output_ids)
# Move results into an NxN numpy array.
outputs = np.array(output_list).reshape(N, N)
# This program should take approximately N*N*0.5s/p, where
# p is the number of cores on your machine, N*N
# is the number of times we invoke the remote function,
# and 0.5s is the time it takes to execute one instance
# of the remote function. For example, for two cores this
# program will take approximately 25sec.
There are a number of advantages of using Ray over the multiprocessingmodule. In particular, the same codewill run on a single machine as well as on a cluster of machines. For more advantages of Ray see this related post.
与多处理模块相比,使用 Ray 有许多优点。特别是,相同的代码将在单台机器和机器集群上运行。有关 Ray 的更多优势,请参阅此相关帖子。
Note:One point to keep in mind is that each remote function is executed in a separate process, possibly on a different machine, and thus the remote function's computation should take more than invoking a remote function. As a rule of thumb a remote function's computation should take at least a few 10s of msec to amortize the scheduling and startup overhead of a remote function.
注意:要记住的一点是,每个远程函数都在一个单独的进程中执行,可能在不同的机器上,因此远程函数的计算应该比调用远程函数花费的时间更多。根据经验,远程函数的计算应该至少需要几十毫秒的时间来分摊远程函数的调度和启动开销。
回答by 0-_-0
I recommend trying joblib Parallel.
我建议尝试 joblib Parallel。
one liner
一个班轮
from joblib import Parallel, delayed
out = Parallel(n_jobs=2)(delayed(heavymethod)(i) for i in range(10))
instructional
教学的
instead of taking a for loop
而不是采用 for 循环
from time import sleep
for _ in range(10):
sleep(.2)
rewrite your operation into a list comprehension
将您的操作重写为列表推导式
[sleep(.2) for _ in range(10)]
Now let us not directly evaluate the expression, but collect what should be done.
This is what the delayedmethod is for.
现在让我们不要直接评估表达式,而是收集应该做什么。这就是该delayed方法的用途。
from joblib import delayed
[delayed(sleep(.2)) for _ in range(10)]
Next instantiate a parallel process with n_workers and process the list.
接下来用 n_workers 实例化一个并行进程并处理列表。
from joblib import Parallel
r = Parallel(n_jobs=2, verbose=10)(delayed(sleep)(.2) for _ in range(10))
[Parallel(n_jobs=2)]: Done 1 tasks | elapsed: 0.6s
[Parallel(n_jobs=2)]: Done 4 tasks | elapsed: 0.8s
[Parallel(n_jobs=2)]: Done 10 out of 10 | elapsed: 1.4s finished

