Python如何在类内部进行多处理？

Question

提问by TheBear

I have a code structure that looks like this:

我有一个看起来像这样的代码结构：

Class A:
  def __init__(self):
    processes = []
    for i in range(1000):
      p = Process(target=self.RunProcess, args=i)
      processes.append[p]

    # Start all processes
    [x.start() for x in processes]

  def RunProcess(self, i):
    do something with i...

Main script:

主要脚本：

myA = A()

I can't seem to get this to run. I get a runtime error "An attempt has been made to start a new process before the current process has finished its bootstrapping phase."

我似乎无法让它运行。我收到运行时错误“在当前进程完成其引导阶段之前尝试启动一个新进程。”

How do I get multiple processing working for this? If I use Threading, it works fine but it is as slow as sequential... And I'm also afraid that multiple processing will also be slow because it takes longer for the the process to be created?

我如何为此进行多重处理？如果我使用线程，它工作正常，但它和顺序一样慢......而且我还担心多重处理也会很慢，因为创建进程需要更长的时间？

Any good tips? Many thanks in advance.

有什么好的提示吗？提前谢谢了。

Answer 1

回答by Dima Tisnek

A practical work-around is to break down your class, e.g. like this:

一个实际的解决方法是分解你的类，例如这样：

class A:
    def __init__(self, ...):
        pass

    def compute(self):
        procs = [Process(self.run, ...) for ... in ...]
        [p.start() for p in procs]
        [p.join() for p in procs]

    def run(self, ...):
        pass

pool = A(...)
pool.compute()

When you fork a process inside __init__, the class instance selfmay not be fully initialised anyway, thus it's odd to ask a subprocess to execute self.run, although technically, yes, it's possible.

当你在里面 fork 一个进程时__init__，类实例self无论如何都可能没有完全初始化，因此要求子进程执行self.run是很奇怪的，尽管从技术上讲，是的，这是可能的。

If it's not that, then it sounds like an instance of this issue:

如果不是这样，那么这听起来像是这个问题的一个实例：

http://bugs.python.org/issue11240

Answer 2

回答by Haleemur Ali

There are a couple of syntax issues that I can see in your code:

我可以在您的代码中看到几个语法问题：

argsin Processexpects a tuple, you pass an integer, please change line 5 to:
p = Process(target=self.RunProcess, args=(i,))
list.appendis a method and arguments passed to it should be enclosed in (), not [], please change line 6 to:
processes.append(p)

args在Process期望一个元组，您传递一个整数，请将第 5 行更改为：
p = Process(target=self.RunProcess, args=(i,))
list.append是一个方法，传递给它的参数应该包含在中()，而不是中[]，请将第 6 行更改为：
processes.append(p)

As @qarma points out, its not good practice to start the processes in the class constructor. I would structure the code as follows (adapting your example):

正如@qarma 指出的那样，在类构造函数中启动进程并不是一个好习惯。我将按如下方式构建代码（调整您的示例）：

import multiprocessing as mp
from time import sleep

class A(object):
    def __init__(self, *args, **kwargs):
        # do other stuff
        pass

    def do_something(self, i):
        sleep(0.2)
        print('%s * %s = %s' % (i, i, i*i))

    def run(self):
        processes = []

        for i in range(1000):
            p = mp.Process(target=self.do_something, args=(i,))
            processes.append(p)

        [x.start() for x in processes]


if __name__ == '__main__':
    a = A()
    a.run()

Answer 3

回答by Mike McKerns

It should simplify things for you to use a Pool. As far as speed, starting up the processes does take time. However, using a Poolas opposed to running njobsof Processshould be as fast as you can get it to run with processes. The default setting for a Pool(as used below) is to use the maximum number of processes available (i.e. the number of CPUs you have), and keep farming out new jobs to a worker as soon as a job completes. You won't get njobs-way parallel, but you'll get as much parallelism that your CPUs can handle without oversubscribing your processors. I'm using pathos, which has a fork of multiprocessingbecause it's a bit more robust than standard multiprocessing… and, well, I'm also the author. But you could probably use multiprocessingfor this.

使用Pool. 就速度而言，启动进程确实需要时间。但是，使用 aPool而不是 running njobsofProcess应该尽可能快地让它与进程一起运行。a 的默认设置Pool（如下所用）是使用可用进程的最大数量（即您拥有的 CPU 数量），并在工作完成后立即将新工作分配给工作人员。您不会获得njobs-way 并行，但您将获得 CPU 可以处理的并行度，而不会超额订阅您的处理器。我正在使用pathos，它有一个 for 分支，multiprocessing因为它比标准更强大multiprocessing……而且，我也是作者。但你可能会使用multiprocessing这个。

>>> from pathos.multiprocessing import ProcessingPool as Pool
>>> class A(object):
...   def __init__(self, njobs=1000):
...     self.map = Pool().map
...     self.njobs = njobs
...     self.start()
...   def start(self):
...     self.result = self.map(self.RunProcess, range(self.njobs))
...     return self.result
...   def RunProcess(self, i):
...     return i*i
... 
>>> myA = A()
>>> myA.result[:11]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
>>> myA.njobs = 3
>>> myA.start()  
[0, 1, 4]

It's a bit of an odd design to start the Poolinside of __init__. But if you want to do that, you have to get results from something like self.result… and you can use self.startfor subsequent calls.

这是一个有点古怪的设计开始Pool的内部__init__。但是如果你想这样做，你必须从类似self.result......的东西中获得结果，你可以self.start用于后续调用。

Get pathoshere: https://github.com/uqfoundation

获取pathos此：https://github.com/uqfoundation

Python如何在类内部进行多处理？

提问by TheBear

回答by Dima Tisnek

回答by Haleemur Ali

回答by Mike McKerns

相关推荐

最近更新

标签

Python如何在类内部进行多处理？

提问by TheBear

回答by Dima Tisnek

回答by Haleemur Ali

回答by Mike McKerns

相关推荐

Python 如何在不指定绝对路径的情况下使用 PIL.ImageFont.truetype 加载字体文件？

Python 类型错误：不能pickle 生成器对象

Python Pandas 选择索引，其中索引大于 x

Python Arff Loader：AttributeError：'dict'对象没有属性'data'

相关推荐

最近更新

标签