Python 在多处理进程之间共享大型只读 Numpy 数组
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17785275/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Share Large, Read-Only Numpy Array Between Multiprocessing Processes
提问by Will
I have a 60GB SciPy Array (Matrix) I must share between 5+ multiprocessing
Process
objects. I've seen numpy-sharedmem and read this discussionon the SciPy list. There seem to be two approaches--numpy-sharedmem
and using a multiprocessing.RawArray()
and mapping NumPy dtype
s to ctype
s. Now, numpy-sharedmem
seems to be the way to go, but I've yet to see a good reference example. I don't need any kind of locks, since the array (actually a matrix) will be read-only. Now, due to its size, I'd like to avoid a copy. It sounds likethe correct method is to create the onlycopy of the array as a sharedmem
array, and then pass it to the Process
objects? A couple of specific questions:
我有一个 60GB 的 SciPy 数组(矩阵),我必须在 5 个以上的multiprocessing
Process
对象之间共享。我看过 numpy-sharedmem 并在 SciPy 列表上阅读了这个讨论。似乎有两种方法numpy-sharedmem
——使用 amultiprocessing.RawArray()
并将 NumPy dtype
s映射到ctype
s。现在,numpy-sharedmem
似乎是要走的路,但我还没有看到一个很好的参考例子。我不需要任何类型的锁,因为数组(实际上是一个矩阵)将是只读的。现在,由于它的大小,我想避免复制。这听起来像是正确的方法是创建唯一的数组作为副本sharedmem
数组,然后将它传递给Process
对象?几个具体问题:
What's the best way to actually pass the sharedmem handles to sub-
Process()
es? Do I need a queue just to pass one array around? Would a pipe be better? Can I just pass it as an argument to theProcess()
subclass's init (where I'm assuming it's pickled)?In the discussion I linked above, there's mention of
numpy-sharedmem
not being 64bit-safe? I'm definitely using some structures that aren't 32-bit addressable.Are there tradeoff's to the
RawArray()
approach? Slower, buggier?Do I need any ctype-to-dtype mapping for the numpy-sharedmem method?
Does anyone have an example of some OpenSource code doing this? I'm a very hands-on learned and it's hard to get this working without any kind of good example to look at.
将 sharedmem 句柄实际传递给 subes 的最佳方法是
Process()
什么?我是否需要一个队列来传递一个数组?管道会更好吗?我可以将它作为参数传递给Process()
子类的 init (我假设它是腌制的)?在我上面链接的讨论中,提到
numpy-sharedmem
不是 64 位安全的?我肯定在使用一些不是 32 位可寻址的结构。该
RawArray()
方法是否存在权衡?更慢,马车?numpy-sharedmem 方法是否需要任何 ctype-to-dtype 映射?
有没有人有一些开源代码这样做的例子?我是一个非常动手学习的人,如果没有任何好的例子可以看,很难让它工作。
If there's any additional info I can provide to help clarify this for others, please comment and I'll add. Thanks!
如果我可以提供任何其他信息来帮助其他人澄清这一点,请发表评论,我会补充。谢谢!
This needs to run on Ubuntu Linux and MaybeMac OS, but portability isn't a huge concern.
这需要在Ubuntu Linux和运行可能的Mac OS,但便携性是不是一个巨大的关注。
采纳答案by James Lim
@Velimir Mlaker gave a great answer. I thought I could add some bits of comments and a tiny example.
@Velimir Mlaker 给出了很好的答案。我想我可以添加一些评论和一个小例子。
(I couldn't find much documentation on sharedmem - these are the results of my own experiments.)
(我找不到关于 sharedmem 的太多文档——这些是我自己实验的结果。)
- Do you need to pass the handles when the subprocess is starting, or after it has started? If it's just the former, you can just use the
target
andargs
arguments forProcess
. This is potentially better than using a global variable. - From the discussion page you linked, it appears that support for 64-bit Linux was added to sharedmem a while back, so it could be a non-issue.
- I don't know about this one.
- No. Refer to example below.
- 您是否需要在子进程启动时或启动后传递句柄?如果只是前者,您可以只使用
target
和args
参数Process
。这可能比使用全局变量更好。 - 从您链接的讨论页面看来,对 64 位 Linux 的支持似乎在不久前已添加到 sharedmem 中,因此这可能不是问题。
- 我不知道这个。
- 否。请参阅下面的示例。
Example
例子
#!/usr/bin/env python
from multiprocessing import Process
import sharedmem
import numpy
def do_work(data, start):
data[start] = 0;
def split_work(num):
n = 20
width = n/num
shared = sharedmem.empty(n)
shared[:] = numpy.random.rand(1, n)[0]
print "values are %s" % shared
processes = [Process(target=do_work, args=(shared, i*width)) for i in xrange(num)]
for p in processes:
p.start()
for p in processes:
p.join()
print "values are %s" % shared
print "type is %s" % type(shared[0])
if __name__ == '__main__':
split_work(4)
Output
输出
values are [ 0.81397784 0.59667692 0.10761908 0.6736734 0.46349645 0.98340718
0.44056863 0.10701816 0.67167752 0.29158274 0.22242552 0.14273156
0.34912309 0.43812636 0.58484507 0.81697513 0.57758441 0.4284959
0.7292129 0.06063283]
values are [ 0. 0.59667692 0.10761908 0.6736734 0.46349645 0.
0.44056863 0.10701816 0.67167752 0.29158274 0. 0.14273156
0.34912309 0.43812636 0.58484507 0. 0.57758441 0.4284959
0.7292129 0.06063283]
type is <type 'numpy.float64'>
This related questionmight be useful.
这个相关的问题可能有用。
回答by Dr. Jan-Philip Gehrcke
If you are on Linux (or any POSIX-compliant system), you can define this array as a global variable. multiprocessing
is using fork()
on Linux when it starts a new child process. A newly spawned child process automatically shares the memory with its parent as long as it does not change it (copy-on-writemechanism).
如果您使用的是 Linux(或任何符合 POSIX 的系统),您可以将此数组定义为全局变量。当它启动一个新的子进程时在 Linuxmultiprocessing
上使用fork()
。新产生的子进程会自动与其父进程共享内存,只要它不更改它(写时复制机制)。
Since you are saying "I don't need any kind of locks, since the array (actually a matrix) will be read-only" taking advantage of this behavior would be a very simple and yet extremely efficient approach: all child processes will access the same data in physical memory when reading this large numpy array.
由于您说“我不需要任何类型的锁,因为数组(实际上是一个矩阵)将是只读的”,因此利用这种行为将是一种非常简单但非常有效的方法:所有子进程都将访问读取这个大型 numpy 数组时,物理内存中的数据相同。
Don't hand your array to the Process()
constructor, this will instruct multiprocessing
to pickle
the data to the child, which would be extremely inefficient or impossible in your case. On Linux, right after fork()
the child is an exact copy of the parent using the same physical memory, so all you need to do is making sure that the Python variable 'containing' the matrix is accessible from within the target
function that you hand over to Process()
. This you can typically achieve with a 'global' variable.
不要将您的数组交给Process()
构造函数,这会将数据指示multiprocessing
给pickle
孩子,这在您的情况下效率极低或不可能。在 Linux 上,紧跟在fork()
孩子之后的是使用相同物理内存的父级的精确副本,因此您需要做的就是确保“包含”矩阵的 Python 变量可从target
您移交给的函数中访问Process()
。这通常可以使用“全局”变量来实现。
Example code:
示例代码:
from multiprocessing import Process
from numpy import random
global_array = random.random(10**4)
def child():
print sum(global_array)
def main():
processes = [Process(target=child) for _ in xrange(10)]
for p in processes:
p.start()
for p in processes:
p.join()
if __name__ == "__main__":
main()
On Windows -- which does not support fork()
-- multiprocessing
is using the win32 API call CreateProcess
. It creates an entirely new process from any given executable. That's why on Windows one is requiredto pickle data to the child if one needs data that has been created during runtime of the parent.
在Windows上-不支持fork()
-multiprocessing
使用Win32 API调用CreateProcess
。它从任何给定的可执行文件创建一个全新的进程。这就是为什么在 Windows上,如果需要在父级运行时创建的数据,则需要将数据腌制给子级。
回答by Saullo G. P. Castro
If your array is that big you can use numpy.memmap
. For example, if you have an array stored in disk, say 'test.array'
, you can use simultaneous processes to access the data in it even in "writing" mode, but your case is simpler since you only need "reading" mode.
如果你的数组那么大,你可以使用numpy.memmap
. 例如,如果您有一个存储在磁盘中的数组'test.array'
,即使在“写入”模式下,您也可以使用同步进程访问其中的数据,但您的情况更简单,因为您只需要“读取”模式。
Creating the array:
创建数组:
a = np.memmap('test.array', dtype='float32', mode='w+', shape=(100000,1000))
You can then fill this array in the same way you do with an ordinary array. For example:
然后,您可以使用与普通数组相同的方式填充该数组。例如:
a[:10,:100]=1.
a[10:,100:]=2.
The data is stored into disk when you delete the variable a
.
删除变量时,数据将存储到磁盘中a
。
Later on you can use multiple processes that will access the data in test.array
:
稍后您可以使用多个进程来访问 中的数据test.array
:
# read-only mode
b = np.memmap('test.array', dtype='float32', mode='r', shape=(100000,1000))
# read and writing mode
c = np.memmap('test.array', dtype='float32', mode='r+', shape=(100000,1000))
Related answers:
相关回答:
回答by Velimir Mlaker
You may be interested in a tiny piece of code I wrote: github.com/vmlaker/benchmark-sharedmem
你可能对我写的一小段代码感兴趣:github.com/vmlaker/benchmark-sharedmem
The only file of interest is main.py
. It's a benchmark of numpy-sharedmem-- the code simply passes arrays (either numpy
or sharedmem
) to spawned processes, via Pipe. The workers just call sum()
on the data. I was only interested in comparing the data communication times between the two implementations.
唯一感兴趣的文件是main.py
. 它是numpy-sharedmem的基准——代码只是通过管道将数组(numpy
或sharedmem
)传递给衍生的进程。工人只是调用sum()
数据。我只对比较两个实现之间的数据通信时间感兴趣。
I also wrote another, more complex code: github.com/vmlaker/sherlock.
我还编写了另一个更复杂的代码:github.com/vmlaker/sherlock。
Here I use the numpy-sharedmemmodule for real-time image processing with OpenCV -- the images are NumPy arrays, as per OpenCV's newer cv2
API. The images, actually references thereof, are shared between processes via the dictionary object created from multiprocessing.Manager
(as opposed to using Queue or Pipe.) I'm getting great performance improvements when compared with using plain NumPy arrays.
在这里,我使用numpy-sharedmem模块通过 OpenCV 进行实时图像处理——根据 OpenCV 的新cv2
API ,图像是 NumPy 数组。图像,实际上是其引用,通过从multiprocessing.Manager
(而不是使用队列或管道)创建的字典对象在进程之间共享。与使用普通 NumPy 数组相比,我获得了很大的性能改进。
Pipe vs. Queue:
管道与队列:
In my experience, IPC with Pipe is faster than Queue. And that makes sense, since Queue adds locking to make it safe for multiple producers/consumers. Pipe doesn't. But if you only have two processes talking back-and-forth, it's safe to use Pipe, or, as the docs read:
根据我的经验,IPC with Pipe 比 Queue 快。这是有道理的,因为 Queue 添加了锁定以确保多个生产者/消费者的安全。管道没有。但是,如果您只有两个进程来回通信,则使用 Pipe 是安全的,或者,如文档所述:
... there is no risk of corruption from processes using different ends of the pipe at the same time.
...同时使用管道的不同端的进程没有损坏的风险。
sharedmem
safety:
sharedmem
安全:
The main issue with sharedmem
module is the possibility of memory leak upon ungraceful program exit. This is described in a lengthy discussion here. Although on Apr 10, 2011 Sturla mentions a fix to memory leak, I have still experienced leaks since then, using both repos, Sturla Molden's own on GitHub (github.com/sturlamolden/sharedmem-numpy) and Chris Lee-Messer's on Bitbucket (bitbucket.org/cleemesser/numpy-sharedmem).
sharedmem
模块的主要问题是程序异常退出时内存泄漏的可能性。这在此处的冗长讨论中进行了描述。尽管在 2011 年 4 月 10 日 Sturla 提到了内存泄漏的修复,但从那时起我仍然遇到了泄漏,使用这两个存储库,Sturla Molden 在 GitHub 上自己的(github.com/sturlamolden/sharedmem-numpy)和 Chris Lee-Messer 在 Bitbucket 上(bitbucket.org/cleemesser/numpy-sharedmem)。
回答by Steve Barnes
You may also find it useful to take a look at the documentation for pyroas if you can partition your task appropriately you could use it to execute different sections on different machines as well as on different cores in the same machine.
您可能还会发现查看pyro的文档很有用,就好像您可以适当地对任务进行分区一样,您可以使用它在不同机器上以及在同一台机器上的不同内核上执行不同的部分。