python中的并行处理
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3842237/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Parallel Processing in python
提问by calccrypto
Whats a simple code that does parallel processing in python 2.7? All the examples Ive found online are convoluted and include unnecessary codes.
在python 2.7中进行并行处理的简单代码是什么?我在网上找到的所有示例都很复杂,并且包含不必要的代码。
how would i do a simple brute force integer factoring program where I can factor 1 integer on each core (4)? my real program probably only needs 2 cores, and need to share information.
我将如何做一个简单的蛮力整数分解程序,我可以在每个核心 (4) 上分解 1 个整数?我真正的程序可能只需要 2 个内核,并且需要共享信息。
I know that parallel-python and other libraries exist, but i want to keep the number of libraries used to a minimum, thus I want to use the threadand/or multiprocessinglibraries, since they come with python
我知道存在并行 python 和其他库,但我想将使用的库数量保持在最低限度,因此我想使用thread和/或multiprocessing库,因为它们随 python 一起提供
采纳答案by Jonathan Dursi
A good simple way to start with parallel processing in python is just the pool mapping in mutiprocessing -- its like the usual python maps but individual function calls are spread out over the different number of processes.
在 python 中开始并行处理的一个很好的简单方法就是多处理中的池映射——它就像通常的 python 映射一样,但单个函数调用分布在不同数量的进程上。
Factoring is a nice example of this - you can brute-force check all the divisions spreading out over all available tasks:
因式分解就是一个很好的例子——你可以蛮力检查分布在所有可用任务上的所有部门:
from multiprocessing import Pool
import numpy
numToFactor = 976
def isFactor(x):
result = None
div = (numToFactor / x)
if div*x == numToFactor:
result = (x,div)
return result
if __name__ == '__main__':
pool = Pool(processes=4)
possibleFactors = range(1,int(numpy.floor(numpy.sqrt(numToFactor)))+1)
print 'Checking ', possibleFactors
result = pool.map(isFactor, possibleFactors)
cleaned = [x for x in result if not x is None]
print 'Factors are', cleaned
This gives me
这给了我
Checking [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]
Factors are [(1, 976), (2, 488), (4, 244), (8, 122), (16, 61)]
回答by Tim McNamara
回答by Mike McKerns
I agree that using Poolfrom multiprocessingis probably the best route if you want to stay within the standard library. If you are interested in doing other types of parallel processing, but not learning anything new (i.e. still using the same interface as multiprocessing), then you could try pathos, which provides several forms of parallel maps and has pretty much the same interface as multiprocessingdoes.
我同意如果你想留在标准库中,使用Poolfrommultiprocessing可能是最好的方法。如果您对其他类型的并行处理感兴趣,但没有学习任何新东西(即仍然使用与 相同的接口multiprocessing),那么您可以尝试pathos,它提供了几种形式的并行映射,并且具有与 几乎相同的接口multiprocessing。
Python 2.7.6 (default, Nov 12 2013, 13:26:39)
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
>>> numToFactor = 976
>>> def isFactor(x):
... result = None
... div = (numToFactor / x)
... if div*x == numToFactor:
... result = (x,div)
... return result
...
>>> from pathos.multiprocessing import ProcessingPool as MPool
>>> p = MPool(4)
>>> possible = range(1,int(numpy.floor(numpy.sqrt(numToFactor)))+1)
>>> # standard blocking map
>>> result = [x for x in p.map(isFactor, possible) if x is not None]
>>> print result
[(1, 976), (2, 488), (4, 244), (8, 122), (16, 61)]
>>>
>>> # asynchronous map (there's also iterative maps too)
>>> obj = p.amap(isFactor, possible)
>>> obj
<processing.pool.MapResult object at 0x108efc450>
>>> print [x for x in obj.get() if x is not None]
[(1, 976), (2, 488), (4, 244), (8, 122), (16, 61)]
>>>
>>> # there's also parallel-python maps (blocking, iterative, and async)
>>> from pathos.pp import ParallelPythonPool as PPool
>>> q = PPool(4)
>>> result = [x for x in q.map(isFactor, possible) if x is not None]
>>> print result
[(1, 976), (2, 488), (4, 244), (8, 122), (16, 61)]
Also, pathoshas a sister package with the same interface, called pyina, which runs mpi4py, but provides it with parallel maps that run in MPI and can be run using several schedulers.
此外,pathos还有一个具有相同接口的姊妹包,称为pyina,它运行mpi4py,但为它提供了在 MPI 中运行的并行映射,并且可以使用多个调度程序运行。
One other advantage is that pathoscomes with a much better serializer than you can get in standard python, so it's much more capable than multiprocessingat serializing a range of functions and other things. And you can do everything from the interpreter.
另一个优点是它pathos带有一个比标准 python 更好的序列化器,因此它比multiprocessing序列化一系列函数和其他东西更有能力。你可以从口译员那里做任何事情。
>>> class Foo(object):
... b = 1
... def factory(self, a):
... def _square(x):
... return a*x**2 + self.b
... return _square
...
>>> f = Foo()
>>> f.b = 100
>>> g = f.factory(-1)
>>> p.map(g, range(10))
[100, 99, 96, 91, 84, 75, 64, 51, 36, 19]
>>>
Get the code here: https://github.com/uqfoundation
在此处获取代码:https: //github.com/uqfoundation
回答by Ion Stoica
This can be done elegantly with Ray, a system that allows you to easily parallelize and distribute your Python code.
这可以通过Ray优雅地完成,这是一个允许您轻松并行化和分发 Python 代码的系统。
To parallelize your example, you'd need to define your map function with the @ray.remotedecorator, and then invoke it with .remote. This will ensure that every instance of the remote function will executed in a different process.
要并行化您的示例,您需要使用@ray.remote装饰器定义 map 函数,然后使用.remote. 这将确保远程函数的每个实例都将在不同的进程中执行。
import ray
ray.init()
# Define the function to compute the factors of a number as a remote function.
# This will make sure that a call to this function will run it in a different
# process.
@ray.remote
def compute_factors(x):
factors = []
for i in range(1, x + 1):
if x % i == 0:
factors.append(i)
return factors
# List of inputs.
inputs = [67, 24, 18, 312]
# Call a copy of compute_factors() on each element in inputs.
# Each copy will be executed in a separate process.
# Note that a remote function returns a future, i.e., an
# identifier of the result, rather that the result itself.
# This enables the calls to remote function to not be blocking,
# which enables us to call many remote function in parallel.
result_ids = [compute_factors.remote(x) for x in inputs]
# Now get the results
results = ray.get(result_ids)
# Print the results.
for i in range(len(inputs)):
print("The factors of", inputs[i], "are", results[i])
There are a number of advantages of using Ray over the multiprocessingmodule. In particular, the same codewill run on a single machine as well as on a cluster of machines. For more advantages of Ray see this related post.
与多处理模块相比,使用 Ray 有许多优点。特别是,相同的代码将在单台机器和机器集群上运行。有关 Ray 的更多优势,请参阅此相关帖子。

