Python 在多处理进程之间共享大型只读 Numpy 数组

Question

提问by Will

I have a 60GB SciPy Array (Matrix) I must share between 5+ multiprocessingProcessobjects. I've seen numpy-sharedmem and read this discussionon the SciPy list. There seem to be two approaches--numpy-sharedmemand using a multiprocessing.RawArray()and mapping NumPy dtypes to ctypes. Now, numpy-sharedmemseems to be the way to go, but I've yet to see a good reference example. I don't need any kind of locks, since the array (actually a matrix) will be read-only. Now, due to its size, I'd like to avoid a copy. It sounds likethe correct method is to create the onlycopy of the array as a sharedmemarray, and then pass it to the Processobjects? A couple of specific questions:

我有一个 60GB 的 SciPy 数组（矩阵），我必须在 5 个以上的multiprocessingProcess对象之间共享。我看过 numpy-sharedmem 并在 SciPy 列表上阅读了这个讨论。似乎有两种方法numpy-sharedmem——使用 amultiprocessing.RawArray()并将 NumPy dtypes映射到ctypes。现在，numpy-sharedmem似乎是要走的路，但我还没有看到一个很好的参考例子。我不需要任何类型的锁，因为数组（实际上是一个矩阵）将是只读的。现在，由于它的大小，我想避免复制。这听起来像是正确的方法是创建唯一的数组作为副本sharedmem数组，然后将它传递给Process对象？几个具体问题：

What's the best way to actually pass the sharedmem handles to sub-Process()es? Do I need a queue just to pass one array around? Would a pipe be better? Can I just pass it as an argument to the Process()subclass's init (where I'm assuming it's pickled)?
In the discussion I linked above, there's mention of numpy-sharedmemnot being 64bit-safe? I'm definitely using some structures that aren't 32-bit addressable.
Are there tradeoff's to the RawArray()approach? Slower, buggier?
Do I need any ctype-to-dtype mapping for the numpy-sharedmem method?
Does anyone have an example of some OpenSource code doing this? I'm a very hands-on learned and it's hard to get this working without any kind of good example to look at.

将 sharedmem 句柄实际传递给 subes 的最佳方法是Process()什么？我是否需要一个队列来传递一个数组？管道会更好吗？我可以将它作为参数传递给Process()子类的 init （我假设它是腌制的）？
在我上面链接的讨论中，提到numpy-sharedmem不是 64 位安全的？我肯定在使用一些不是 32 位可寻址的结构。
该RawArray()方法是否存在权衡？更慢，马车？
numpy-sharedmem 方法是否需要任何 ctype-to-dtype 映射？
有没有人有一些开源代码这样做的例子？我是一个非常动手学习的人，如果没有任何好的例子可以看，很难让它工作。

If there's any additional info I can provide to help clarify this for others, please comment and I'll add. Thanks!

如果我可以提供任何其他信息来帮助其他人澄清这一点，请发表评论，我会补充。谢谢！

This needs to run on Ubuntu Linux and MaybeMac OS, but portability isn't a huge concern.

这需要在Ubuntu Linux和运行可能的Mac OS，但便携性是不是一个巨大的关注。

Answer 1

采纳答案by James Lim

@Velimir Mlaker gave a great answer. I thought I could add some bits of comments and a tiny example.

@Velimir Mlaker 给出了很好的答案。我想我可以添加一些评论和一个小例子。

(I couldn't find much documentation on sharedmem - these are the results of my own experiments.)

（我找不到关于 sharedmem 的太多文档——这些是我自己实验的结果。）

Do you need to pass the handles when the subprocess is starting, or after it has started? If it's just the former, you can just use the targetand argsarguments for Process. This is potentially better than using a global variable.
From the discussion page you linked, it appears that support for 64-bit Linux was added to sharedmem a while back, so it could be a non-issue.
I don't know about this one.
No. Refer to example below.

您是否需要在子进程启动时或启动后传递句柄？如果只是前者，您可以只使用target和args参数Process。这可能比使用全局变量更好。
从您链接的讨论页面看来，对 64 位 Linux 的支持似乎在不久前已添加到 sharedmem 中，因此这可能不是问题。
我不知道这个。
否。请参阅下面的示例。

Example

例子

#!/usr/bin/env python
from multiprocessing import Process
import sharedmem
import numpy

def do_work(data, start):
    data[start] = 0;

def split_work(num):
    n = 20
    width  = n/num
    shared = sharedmem.empty(n)
    shared[:] = numpy.random.rand(1, n)[0]
    print "values are %s" % shared

    processes = [Process(target=do_work, args=(shared, i*width)) for i in xrange(num)]
    for p in processes:
        p.start()
    for p in processes:
        p.join()

    print "values are %s" % shared
    print "type is %s" % type(shared[0])

if __name__ == '__main__':
    split_work(4)

Output

输出

values are [ 0.81397784  0.59667692  0.10761908  0.6736734   0.46349645  0.98340718
  0.44056863  0.10701816  0.67167752  0.29158274  0.22242552  0.14273156
  0.34912309  0.43812636  0.58484507  0.81697513  0.57758441  0.4284959
  0.7292129   0.06063283]
values are [ 0.          0.59667692  0.10761908  0.6736734   0.46349645  0.
  0.44056863  0.10701816  0.67167752  0.29158274  0.          0.14273156
  0.34912309  0.43812636  0.58484507  0.          0.57758441  0.4284959
  0.7292129   0.06063283]
type is <type 'numpy.float64'>

This related questionmight be useful.

这个相关的问题可能有用。

Answer 2

回答by Dr. Jan-Philip Gehrcke

If you are on Linux (or any POSIX-compliant system), you can define this array as a global variable. multiprocessingis using fork()on Linux when it starts a new child process. A newly spawned child process automatically shares the memory with its parent as long as it does not change it (copy-on-writemechanism).

如果您使用的是 Linux（或任何符合 POSIX 的系统），您可以将此数组定义为全局变量。当它启动一个新的子进程时在 Linuxmultiprocessing上使用fork()。新产生的子进程会自动与其父进程共享内存，只要它不更改它（写时复制机制）。

Since you are saying "I don't need any kind of locks, since the array (actually a matrix) will be read-only" taking advantage of this behavior would be a very simple and yet extremely efficient approach: all child processes will access the same data in physical memory when reading this large numpy array.

由于您说“我不需要任何类型的锁，因为数组（实际上是一个矩阵）将是只读的”，因此利用这种行为将是一种非常简单但非常有效的方法：所有子进程都将访问读取这个大型 numpy 数组时，物理内存中的数据相同。

Don't hand your array to the Process()constructor, this will instruct multiprocessingto picklethe data to the child, which would be extremely inefficient or impossible in your case. On Linux, right after fork()the child is an exact copy of the parent using the same physical memory, so all you need to do is making sure that the Python variable 'containing' the matrix is accessible from within the targetfunction that you hand over to Process(). This you can typically achieve with a 'global' variable.

不要将您的数组交给Process()构造函数，这会将数据指示multiprocessing给pickle孩子，这在您的情况下效率极低或不可能。在 Linux 上，紧跟在fork()孩子之后的是使用相同物理内存的父级的精确副本，因此您需要做的就是确保“包含”矩阵的 Python 变量可从target您移交给的函数中访问Process()。这通常可以使用“全局”变量来实现。

Example code:

示例代码：

from multiprocessing import Process
from numpy import random


global_array = random.random(10**4)


def child():
    print sum(global_array)


def main():
    processes = [Process(target=child) for _ in xrange(10)]
    for p in processes:
        p.start()
    for p in processes:
        p.join()


if __name__ == "__main__":
    main()

On Windows -- which does not support fork()-- multiprocessingis using the win32 API call CreateProcess. It creates an entirely new process from any given executable. That's why on Windows one is requiredto pickle data to the child if one needs data that has been created during runtime of the parent.

在Windows上-不支持fork()-multiprocessing使用Win32 API调用CreateProcess。它从任何给定的可执行文件创建一个全新的进程。这就是为什么在 Windows上，如果需要在父级运行时创建的数据，则需要将数据腌制给子级。

Answer 3

回答by Saullo G. P. Castro

If your array is that big you can use numpy.memmap. For example, if you have an array stored in disk, say 'test.array', you can use simultaneous processes to access the data in it even in "writing" mode, but your case is simpler since you only need "reading" mode.

如果你的数组那么大，你可以使用numpy.memmap. 例如，如果您有一个存储在磁盘中的数组'test.array'，即使在“写入”模式下，您也可以使用同步进程访问其中的数据，但您的情况更简单，因为您只需要“读取”模式。

Creating the array:

创建数组：

a = np.memmap('test.array', dtype='float32', mode='w+', shape=(100000,1000))

You can then fill this array in the same way you do with an ordinary array. For example:

然后，您可以使用与普通数组相同的方式填充该数组。例如：

a[:10,:100]=1.
a[10:,100:]=2.

The data is stored into disk when you delete the variable a.

删除变量时，数据将存储到磁盘中a。

Later on you can use multiple processes that will access the data in test.array:

稍后您可以使用多个进程来访问中的数据test.array：

# read-only mode
b = np.memmap('test.array', dtype='float32', mode='r', shape=(100000,1000))

# read and writing mode
c = np.memmap('test.array', dtype='float32', mode='r+', shape=(100000,1000))

回答by Velimir Mlaker

You may be interested in a tiny piece of code I wrote: github.com/vmlaker/benchmark-sharedmem

你可能对我写的一小段代码感兴趣：github.com/vmlaker/benchmark-sharedmem

The only file of interest is main.py. It's a benchmark of numpy-sharedmem-- the code simply passes arrays (either numpyor sharedmem) to spawned processes, via Pipe. The workers just call sum()on the data. I was only interested in comparing the data communication times between the two implementations.

唯一感兴趣的文件是main.py. 它是numpy-sharedmem的基准——代码只是通过管道将数组（numpy或sharedmem）传递给衍生的进程。工人只是调用sum()数据。我只对比较两个实现之间的数据通信时间感兴趣。

I also wrote another, more complex code: github.com/vmlaker/sherlock.

我还编写了另一个更复杂的代码：github.com/vmlaker/sherlock。

Here I use the numpy-sharedmemmodule for real-time image processing with OpenCV -- the images are NumPy arrays, as per OpenCV's newer cv2API. The images, actually references thereof, are shared between processes via the dictionary object created from multiprocessing.Manager(as opposed to using Queue or Pipe.) I'm getting great performance improvements when compared with using plain NumPy arrays.

在这里，我使用numpy-sharedmem模块通过 OpenCV 进行实时图像处理——根据 OpenCV 的新cv2API ，图像是 NumPy 数组。图像，实际上是其引用，通过从multiprocessing.Manager（而不是使用队列或管道）创建的字典对象在进程之间共享。与使用普通 NumPy 数组相比，我获得了很大的性能改进。

Pipe vs. Queue:

管道与队列：

In my experience, IPC with Pipe is faster than Queue. And that makes sense, since Queue adds locking to make it safe for multiple producers/consumers. Pipe doesn't. But if you only have two processes talking back-and-forth, it's safe to use Pipe, or, as the docs read:

根据我的经验，IPC with Pipe 比 Queue 快。这是有道理的，因为 Queue 添加了锁定以确保多个生产者/消费者的安全。管道没有。但是，如果您只有两个进程来回通信，则使用 Pipe 是安全的，或者，如文档所述：

... there is no risk of corruption from processes using different ends of the pipe at the same time.

...同时使用管道的不同端的进程没有损坏的风险。

sharedmemsafety:

sharedmem安全：

The main issue with sharedmemmodule is the possibility of memory leak upon ungraceful program exit. This is described in a lengthy discussion here. Although on Apr 10, 2011 Sturla mentions a fix to memory leak, I have still experienced leaks since then, using both repos, Sturla Molden's own on GitHub (github.com/sturlamolden/sharedmem-numpy) and Chris Lee-Messer's on Bitbucket (bitbucket.org/cleemesser/numpy-sharedmem).

sharedmem模块的主要问题是程序异常退出时内存泄漏的可能性。这在此处的冗长讨论中进行了描述。尽管在 2011 年 4 月 10 日 Sturla 提到了内存泄漏的修复，但从那时起我仍然遇到了泄漏，使用这两个存储库，Sturla Molden 在 GitHub 上自己的（github.com/sturlamolden/sharedmem-numpy）和 Chris Lee-Messer 在 Bitbucket 上（bitbucket.org/cleemesser/numpy-sharedmem）。

Answer 5

回答by Steve Barnes

You may also find it useful to take a look at the documentation for pyroas if you can partition your task appropriately you could use it to execute different sections on different machines as well as on different cores in the same machine.

您可能还会发现查看pyro的文档很有用，就好像您可以适当地对任务进行分区一样，您可以使用它在不同机器上以及在同一台机器上的不同内核上执行不同的部分。

Python 在多处理进程之间共享大型只读 Numpy 数组

提问by Will

采纳答案by James Lim

Example

例子

Output

输出

回答by Dr. Jan-Philip Gehrcke

回答by Saullo G. P. Castro

回答by Velimir Mlaker

回答by Steve Barnes

相关推荐

最近更新

标签

Python 在多处理进程之间共享大型只读 Numpy 数组

提问by Will

采纳答案by James Lim

Example

例子

Output

输出

回答by Dr. Jan-Philip Gehrcke

回答by Saullo G. P. Castro

回答by Velimir Mlaker

回答by Steve Barnes

相关推荐

Python Pyspark StructType 未定义

Python 未定义 defaultdict

Python 如何从模型在 django 中设置 DateField 格式？

Python 目标数据库不是最新的

相关推荐

最近更新

标签