在 Python 进程之间共享复杂对象?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3671666/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Sharing a complex object between Python processes?
提问by Paul
I have a fairly complex Python object that I need to share between multiple processes. I launch these processes using multiprocessing.Process. When I share an object with multiprocessing.Queueand multiprocessing.Pipein it, they are shared just fine. But when I try to share an object with other non-multiprocessing-module objects, it seems like Python forks these objects. Is that true?
我有一个相当复杂的 Python 对象,我需要在多个进程之间共享它。我使用multiprocessing.Process. 当我分享一个对象multiprocessing.Queue,并multiprocessing.Pipe在其中,他们共享就好了。但是当我尝试与其他非多处理模块对象共享一个对象时,Python 似乎分叉了这些对象。真的吗?
I tried using multiprocessing.Value. But I'm not sure what the type should be? My object class is called MyClass. But when I try multiprocess.Value(MyClass, instance), it fails with:
我尝试使用 multiprocessing.Value。但我不确定应该是什么类型?我的对象类称为 MyClass。但是当我尝试时multiprocess.Value(MyClass, instance),它失败了:
TypeError: this type has no size
TypeError: this type has no size
Any idea what's going on?
知道发生了什么吗?
采纳答案by David
You can do this using Python's Multiprocessing "Manager" classes and a proxy class that you define. From the Python docs: http://docs.python.org/library/multiprocessing.html#proxy-objects
您可以使用 Python 的多处理“管理器”类和您定义的代理类来执行此操作。来自 Python 文档:http: //docs.python.org/library/multiprocessing.html#proxy-objects
What you want to do is define a proxy class for your custom object, and then share the object using a "Remote Manager" -- look at the examples in the same linked doc page for "remote manager" where the docs show how to share a remote queue. You're going to be doing the same thing, but your call to your_manager_instance.register() will include your custom proxy class in its argument list.
您想要做的是为您的自定义对象定义一个代理类,然后使用“远程管理器”共享该对象——查看“远程管理器”的同一链接文档页面中的示例,其中文档显示了如何共享远程队列。您将做同样的事情,但您对 your_manager_instance.register() 的调用将在其参数列表中包含您的自定义代理类。
In this manner, you're setting up a server to share the custom object with a custom proxy. Your clients need access to the server (again, see the excellent documentation examples of how to setup client/server access to a remote queue, but instead of sharing a queue, you are sharing access to your specific class).
通过这种方式,您将设置一个服务器以与自定义代理共享自定义对象。您的客户端需要访问服务器(同样,请参阅有关如何设置客户端/服务器对远程队列的访问的优秀文档示例,但不是共享队列,而是共享对特定类的访问)。
回答by Tom
After a lot research and testing, I found "Manager" do this job in a non-complexobject level.
经过大量的研究和测试,我发现“Manager”在一个非复杂对象级别上完成了这项工作。
The code below shows that object instis shared between processes, which means property varof instis changed outside when child process changes it.
下面示出了代码对象inst在进程之间共享,这意味着属性var的inst时子进程改变其改变之外。
from multiprocessing import Process, Manager
from multiprocessing.managers import BaseManager
class SimpleClass(object):
def __init__(self):
self.var = 0
def set(self, value):
self.var = value
def get(self):
return self.var
def change_obj_value(obj):
obj.set(100)
if __name__ == '__main__':
BaseManager.register('SimpleClass', SimpleClass)
manager = BaseManager()
manager.start()
inst = manager.SimpleClass()
p = Process(target=change_obj_value, args=[inst])
p.start()
p.join()
print inst # <__main__.SimpleClass object at 0x10cf82350>
print inst.get() # 100
Okay, above code is enoughif you only need to share simple objects.
好的,如果你只需要共享简单的对象,上面的代码就足够了。
Why no complex? Because it may failif your object is nested (object inside object):
为什么不复杂?因为如果您的对象是嵌套的(对象内的对象),它可能会失败:
from multiprocessing import Process, Manager
from multiprocessing.managers import BaseManager
class GetSetter(object):
def __init__(self):
self.var = None
def set(self, value):
self.var = value
def get(self):
return self.var
class ChildClass(GetSetter):
pass
class ParentClass(GetSetter):
def __init__(self):
self.child = ChildClass()
GetSetter.__init__(self)
def getChild(self):
return self.child
def change_obj_value(obj):
obj.set(100)
obj.getChild().set(100)
if __name__ == '__main__':
BaseManager.register('ParentClass', ParentClass)
manager = BaseManager()
manager.start()
inst2 = manager.ParentClass()
p2 = Process(target=change_obj_value, args=[inst2])
p2.start()
p2.join()
print inst2 # <__main__.ParentClass object at 0x10cf82350>
print inst2.getChild() # <__main__.ChildClass object at 0x10cf6dc50>
print inst2.get() # 100
#good!
print inst2.getChild().get() # None
#bad! you need to register child class too but there's almost no way to do it
#even if you did register child class, you may get PicklingError :)
I think the main reason of this behavior is because Manageris just a candybar build on top of low-level communication tools like pipe/queue.
我认为这种行为的主要原因是因为Manager它只是建立在管道/队列等低级通信工具之上的直板。
So, this approach is notwell recommended for multiprocessing case. It's always better if you can use low-level tools like lock/semaphore/pipe/queueor high-level tools like Redis queueor Redis publish/subscribefor complicated use case (only my recommendation lol).
因此,对于多处理情况,不推荐这种方法。如果您可以使用诸如锁/信号量/管道/队列之类的低级工具或诸如Redis 队列或Redis 发布/订阅之类的高级工具来处理复杂用例,则总是更好(仅是我的推荐,哈哈)。
回答by Lenar Hoyt
To save some headaches with shared resources you can try to collect data that needs access to a singleton resource in a return statement of the function that is mapped by e.g. pool.imap_unorderedand then further process it in a loop that retrieves the partial results:
为了避免共享资源的一些麻烦,您可以尝试在 eg 映射的函数的 return 语句中收集需要访问单例资源的数据,pool.imap_unordered然后在检索部分结果的循环中进一步处理它:
for result in in pool.imap_unordered(process_function, iterable_data):
do_something(result)
If it is not much data that gets returned, then there might not be much overhead in doing this.
如果返回的数据不多,那么执行此操作可能不会有太多开销。
回答by Duje
here's a python package I made just for that (sharing complex objects between processes).
这是我为此制作的一个 python 包(在进程之间共享复杂的对象)。
git: https://github.com/dRoje/pipe-proxy
git: https://github.com/dRoje/pipe-proxy
The idea is you create a proxy for your object and pass it to a process. Then you use the proxy like you have a reference to the original object. Although you can only use method calls, so accessing object variables is done threw setters and getters.
这个想法是你为你的对象创建一个代理并将它传递给一个进程。然后您使用代理,就像您拥有对原始对象的引用一样。尽管您只能使用方法调用,但访问对象变量时会抛出 setter 和 getter。
Say we have an object called ‘example', creating proxy and proxy listener is easy:
假设我们有一个名为“example”的对象,创建代理和代理侦听器很容易:
from pipeproxy import proxy
example = Example()
exampleProxy, exampleProxyListener = proxy.createProxy(example)
Now you send the proxy to another process.
现在您将代理发送到另一个进程。
p = Process(target=someMethod, args=(exampleProxy,)) p.start()
Use it in the other process as you would use the original object (example):
在另一个过程中使用它,就像使用原始对象一样(示例):
def someMethod(exampleProxy):
...
exampleProxy.originalExampleMethod()
...
But you do have to listen to it in the main process:
但是你必须在主进程中听它:
exampleProxyListener.listen()
Read more and find examples here:
阅读更多内容并在此处查找示例:
http://matkodjipalo.com/index.php/2017/11/12/proxy-solution-python-multiprocessing/
http://matkodjipalo.com/index.php/2017/11/12/proxy-solution-python-multiprocessing/
回答by Zhong
I tried to use BaseManager and register my customized class to make it happy, and get the problem about nested class just as Tom had mentioned above.
我尝试使用 BaseManager 并注册我的自定义类以使其满意,并且正如 Tom 上面提到的那样解决了嵌套类的问题。
I think the main reason is irrelevant to the nested class as said, yet the communication mechanism that python take in low level. The reason is python use some socket-alike communication mechanism to synchronize the modification of customized class within a server process in low level. I think it encapsulate some rpc methods, make it just transparent to the user as if they called the local methods of a nested class object.
我认为主要原因与所说的嵌套类无关,而是python采用的低级通信机制。原因是python使用了一些类似socket的通信机制来在低级别同步服务器进程内自定义类的修改。我认为它封装了一些 rpc 方法,使其对用户透明,就像他们调用嵌套类对象的本地方法一样。
So, when you want to modify, retrieve your self-defined objects or some third-party objects, you should define some interfaces within your processes to communicate to it rather than directly get or set values.
因此,当您要修改、检索您的自定义对象或某些第三方对象时,您应该在您的进程中定义一些接口与其通信,而不是直接获取或设置值。
Yet when operating the multi-nested objects in the nested objects, one can ignore the issues mentioned above, just as what you do in your common routine because your nested objects in the registered class is not a proxy objects any longer, on which the operation will not go through the socket-alike communication routine again and is localized.
但是在操作嵌套对象中的多嵌套对象时,可以忽略上面提到的问题,就像你在普通例程中所做的那样,因为你注册的类中的嵌套对象不再是代理对象,对它的操作不会再次通过类似套接字的通信例程并已本地化。
Here is the workable code I wrote to solve the problem.
这是我为解决问题而编写的可行代码。
from multiprocessing import Process, Manager, Lock
from multiprocessing.managers import BaseManager
import numpy as np
class NestedObj(object):
def __init__(self):
self.val = 1
class CustomObj(object):
def __init__(self, numpy_obj):
self.numpy_obj = numpy_obj
self.nested_obj = NestedObj()
def set_value(self, p, q, v):
self.numpy_obj[p, q] = v
def get_obj(self):
return self.numpy_obj
def get_nested_obj(self):
return self.nested_obj.val
class CustomProcess(Process):
def __init__(self, obj, p, q, v):
super(CustomProcess, self).__init__()
self.obj = obj
self.index = p, q
self.v = v
def run(self):
self.obj.set_value(*self.index, self.v)
if __name__=="__main__":
BaseManager.register('CustomObj', CustomObj)
manager = BaseManager()
manager.start()
data = [[0 for x in range(10)] for y in range(10)]
matrix = np.matrix(data)
custom_obj = manager.CustomObj(matrix)
print(custom_obj.get_obj())
process_list = []
for p in range(10):
for q in range(10):
proc = CustomProcess(custom_obj, p, q, 10*p+q)
process_list.append(proc)
for x in range(100):
process_list[x].start()
for x in range(100):
process_list[x].join()
print(custom_obj.get_obj())
print(custom_obj.get_nested_obj())
回答by Frederik Petersen
In Python 3.6 the docs say:
在 Python 3.6 中,文档说:
Changed in version 3.6: Shared objects are capable of being nested. For example, a shared container object such as a shared list can contain other shared objects which will all be managed and synchronized by the SyncManager.
在 3.6 版更改: 共享对象能够嵌套。例如,共享容器对象(如共享列表)可以包含其他共享对象,这些对象都将由 SyncManager 管理和同步。
As long as instances are created through the SyncManager, you should be able to make the objects reference each other. Dynamic creation of one type of object in the methods of another type of object might still be impossible or very tricky though.
只要通过 SyncManager 创建实例,您就应该能够使对象相互引用。但是,在另一种类型的对象的方法中动态创建一种类型的对象可能仍然是不可能的或非常棘手的。
Edit: I stumbled upon this issue Multiprocessing managers and custom classeswith python 3.6.5 and 3.6.7. Need to check python 3.7
编辑:我偶然发现了这个问题Multiprocessing manager and custom classeswith python 3.6.5 and 3.6.7。需要检查python 3.7
Edit 2: Due to some other issues I can't currently test this with python3.7. The workaround provided in https://stackoverflow.com/a/50878600/7541006works fine for me
编辑 2:由于其他一些问题,我目前无法使用 python3.7 进行测试。https://stackoverflow.com/a/50878600/7541006 中提供的解决方法对我来说很好用

