Python如何在类内部进行多处理?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/29009790/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python how to do multiprocessing inside of a class?
提问by TheBear
I have a code structure that looks like this:
我有一个看起来像这样的代码结构:
Class A:
def __init__(self):
processes = []
for i in range(1000):
p = Process(target=self.RunProcess, args=i)
processes.append[p]
# Start all processes
[x.start() for x in processes]
def RunProcess(self, i):
do something with i...
Main script:
主要脚本:
myA = A()
I can't seem to get this to run. I get a runtime error "An attempt has been made to start a new process before the current process has finished its bootstrapping phase."
我似乎无法让它运行。我收到运行时错误“在当前进程完成其引导阶段之前尝试启动一个新进程。”
How do I get multiple processing working for this? If I use Threading, it works fine but it is as slow as sequential... And I'm also afraid that multiple processing will also be slow because it takes longer for the the process to be created?
我如何为此进行多重处理?如果我使用线程,它工作正常,但它和顺序一样慢......而且我还担心多重处理也会很慢,因为创建进程需要更长的时间?
Any good tips? Many thanks in advance.
有什么好的提示吗?提前谢谢了。
回答by Dima Tisnek
A practical work-around is to break down your class, e.g. like this:
一个实际的解决方法是分解你的类,例如这样:
class A:
def __init__(self, ...):
pass
def compute(self):
procs = [Process(self.run, ...) for ... in ...]
[p.start() for p in procs]
[p.join() for p in procs]
def run(self, ...):
pass
pool = A(...)
pool.compute()
When you fork a process inside __init__
, the class instance self
may not be fully initialised anyway, thus it's odd to ask a subprocess to execute self.run
, although technically, yes, it's possible.
当你在里面 fork 一个进程时__init__
,类实例self
无论如何都可能没有完全初始化,因此要求子进程执行self.run
是很奇怪的,尽管从技术上讲,是的,这是可能的。
If it's not that, then it sounds like an instance of this issue:
如果不是这样,那么这听起来像是这个问题的一个实例:
回答by Haleemur Ali
There are a couple of syntax issues that I can see in your code:
我可以在您的代码中看到几个语法问题:
args
inProcess
expects a tuple, you pass an integer, please change line 5 to:p = Process(target=self.RunProcess, args=(i,))
list.append
is a method and arguments passed to it should be enclosed in()
, not[]
, please change line 6 to:processes.append(p)
args
在Process
期望一个元组,您传递一个整数,请将第 5 行更改为:p = Process(target=self.RunProcess, args=(i,))
list.append
是一个方法,传递给它的参数应该包含在 中()
,而不是 中[]
,请将第 6 行更改为:processes.append(p)
As @qarma points out, its not good practice to start the processes in the class constructor. I would structure the code as follows (adapting your example):
正如@qarma 指出的那样,在类构造函数中启动进程并不是一个好习惯。我将按如下方式构建代码(调整您的示例):
import multiprocessing as mp
from time import sleep
class A(object):
def __init__(self, *args, **kwargs):
# do other stuff
pass
def do_something(self, i):
sleep(0.2)
print('%s * %s = %s' % (i, i, i*i))
def run(self):
processes = []
for i in range(1000):
p = mp.Process(target=self.do_something, args=(i,))
processes.append(p)
[x.start() for x in processes]
if __name__ == '__main__':
a = A()
a.run()
回答by Mike McKerns
It should simplify things for you to use a Pool
. As far as speed, starting up the processes does take time. However, using a Pool
as opposed to running njobs
of Process
should be as fast as you can get it to run with processes. The default setting for a Pool
(as used below) is to use the maximum number of processes available (i.e. the number of CPUs you have), and keep farming out new jobs to a worker as soon as a job completes. You won't get njobs
-way parallel, but you'll get as much parallelism that your CPUs can handle without oversubscribing your processors. I'm using pathos
, which has a fork of multiprocessing
because it's a bit more robust than standard multiprocessing
… and, well, I'm also the author. But you could probably use multiprocessing
for this.
使用Pool
. 就速度而言,启动进程确实需要时间。但是,使用 aPool
而不是 running njobs
ofProcess
应该尽可能快地让它与进程一起运行。a 的默认设置Pool
(如下所用)是使用可用进程的最大数量(即您拥有的 CPU 数量),并在工作完成后立即将新工作分配给工作人员。您不会获得njobs
-way 并行,但您将获得 CPU 可以处理的并行度,而不会超额订阅您的处理器。我正在使用pathos
,它有一个 for 分支,multiprocessing
因为它比标准更强大multiprocessing
……而且,我也是作者。但你可能会使用multiprocessing
这个。
>>> from pathos.multiprocessing import ProcessingPool as Pool
>>> class A(object):
... def __init__(self, njobs=1000):
... self.map = Pool().map
... self.njobs = njobs
... self.start()
... def start(self):
... self.result = self.map(self.RunProcess, range(self.njobs))
... return self.result
... def RunProcess(self, i):
... return i*i
...
>>> myA = A()
>>> myA.result[:11]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
>>> myA.njobs = 3
>>> myA.start()
[0, 1, 4]
It's a bit of an odd design to start the Pool
inside of __init__
. But if you want to do that, you have to get results from something like self.result
… and you can use self.start
for subsequent calls.
这是一个有点古怪的设计开始Pool
的内部__init__
。但是如果你想这样做,你必须从类似self.result
......的东西中获得结果,你可以self.start
用于后续调用。
Get pathos
here: https://github.com/uqfoundation
获取pathos
此:https://github.com/uqfoundation