Python 以追加模式保存 numpy 数组

Question

提问by user3820991

Is it possible to save a numpy array appending it to an already existing npy-file --- something like np.save(filename,arr,mode='a')?

是否可以保存一个 numpy 数组并将其附加到已经存在的 npy 文件 --- 之类的np.save(filename,arr,mode='a')？

I have several functions that have to iterate over the rows of a large array. I cannot create the array at once because of memory constrains. To avoid to create the rows over and over again, I wanted to create each row once and save it to file appending it to the previous row in the file. Later I could load the npy-file in mmap_mode, accessing the slices when needed.

我有几个函数必须遍历大数组的行。由于内存限制，我无法立即创建数组。为了避免一遍又一遍地创建行，我想创建每一行一次并将其保存到文件中，并将其附加到文件中的前一行。后来我可以在 mmap_mode 中加载 npy 文件，在需要时访问切片。

Answer 1

采纳答案by rth

The build-in .npyfile format is perfectly fine for working with small datasets, without relying on external modules other then numpy.

内置.npy文件格式非常适合处理小型数据集，无需依赖除numpy.

However, when you start having large amounts of data, the use of a file format, such as HDF5, designed to handle such datasets, is to be preferred [1].

但是，当您开始拥有大量数据时，最好使用旨在处理此类数据集的文件格式，例如 HDF5 [1]。

For instance, below is a solution to save numpyarrays in HDF5 with PyTables,

例如，下面是numpy使用PyTables在 HDF5 中保存数组的解决方案，

Step 1: Create an extendable EArraystorage

第 1 步：创建可扩展EArray存储

import tables
import numpy as np

filename = 'outarray.h5'
ROW_SIZE = 100
NUM_COLUMNS = 200

f = tables.open_file(filename, mode='w')
atom = tables.Float64Atom()

array_c = f.create_earray(f.root, 'data', atom, (0, ROW_SIZE))

for idx in range(NUM_COLUMNS):
    x = np.random.rand(1, ROW_SIZE)
    array_c.append(x)
f.close()

Step 2: Append rows to an existing dataset (if needed)

第 2 步：将行附加到现有数据集（如果需要）

f = tables.open_file(filename, mode='a')
f.root.data.append(x)

Step 3: Read back a subset of the data

第 3 步：读回数据的一个子集

f = tables.open_file(filename, mode='r')
print(f.root.data[1:10,2:20]) # e.g. read from disk only this part of the dataset

Answer 2

回答by Mohit Pandey

For appending data to an already existing file using numpy.save, we should use:

要使用 numpy.save 将数据附加到已经存在的文件，我们应该使用：

f_handle = file(filename, 'a')
numpy.save(f_handle, arr)
f_handle.close()

I have checked that it works in python 2.7 and numpy 1.10.4

我已经检查过它在 python 2.7 和 numpy 1.10.4 中是否有效

I have adapted the code from here, which talks about savetxt method.

我已经改编了这里的代码，它谈到了 savetxt 方法。

Answer 3

回答by Evgeny Remizov

.npyfiles contain header which has shape and dtype of the array in it. If you know what your resulting array looks like, you can write header yourself and then data in chunks. E.g., here is the code for concatenating 2d matrices:

.npy文件包含标题，其中包含数组的形状和数据类型。如果您知道生成的数组是什么样的，您可以自己编写标题，然后分块编写数据。例如，这是连接二维矩阵的代码：

import numpy as np
import numpy.lib.format as fmt

def get_header(fnames):
    dtype = None
    shape_0 = 0
    shape_1 = None
    for i, fname in enumerate(fnames):
        m = np.load(fname, mmap_mode='r') # mmap so we read only header really fast
        if i == 0:
            dtype = m.dtype
            shape_1 = m.shape[1]
        else:
            assert m.dtype == dtype
            assert m.shape[1] == shape_1
        shape_0 += m.shape[0]
    return {'descr': fmt.dtype_to_descr(dtype), 'fortran_order': False, 'shape': (shape_0, shape_1)}

def concatenate(res_fname, input_fnames):
    header = get_header(input_fnames)
    with open(res_fname, 'wb') as f:
        fmt.write_array_header_2_0(f, header)
        for fname in input_fnames:
            m = np.load(fname)
            f.write(m.tostring('C'))

If you need a more general solution (edit header in place while appending) you'll have to resort to fseektricks like in [1].

如果您需要更通用的解决方案（在附加时就地编辑标题），您将不得不求助于fseek[1] 中的技巧。

Inspired by
[1]: https://mail.scipy.org/pipermail/numpy-discussion/2009-August/044570.html(doesn't work out of the box)
[2]: https://docs.scipy.org/doc/numpy/neps/npy-format.html
[3]: https://github.com/numpy/numpy/blob/master/numpy/lib/format.py

启发
：[1] https://mail.scipy.org/pipermail/numpy-discussion/2009-August/044570.html（不盒子的锻炼）
[2]：HTTPS：//docs.scipy .org/doc/numpy/neps/npy-format.html
[3]：https: //github.com/numpy/numpy/blob/master/numpy/lib/format.py

Answer 4

回答by Sakhri Houssem

you can try something like reading the file then add new data

您可以尝试读取文件然后添加新数据

import numpy as np
import os.path

x = np.arange(10) #[0 1 2 3 4 5 6 7 8 9]

y = np.load("save.npy") if os.path.isfile("save.npy") else [] #get data if exist
np.save("save.npy",np.append(y,x)) #save the new

after 2 operation:

2次操作后：

print(np.load("save.npy")) #[0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9]

Answer 5

回答by PaxRomana99

This is an expansion on Mohit Pandey's answer showing a full save / load example. It was tested using Python 3.6 and Numpy 1.11.3.

这是 Mohit Pandey 答案的扩展，显示了完整的保存/加载示例。它使用 Python 3.6 和 Numpy 1.11.3 进行了测试。

from pathlib import Path
import numpy as np
import os

p = Path('temp.npy')
with p.open('ab') as f:
    np.save(f, np.zeros(2))
    np.save(f, np.ones(2))

with p.open('rb') as f:
    fsz = os.fstat(f.fileno()).st_size
    out = np.load(f)
    while f.tell() < fsz:
        out = np.vstack((out, np.load(f)))

out = array([[ 0., 0.], [ 1., 1.]])

Python 以追加模式保存 numpy 数组

提问by user3820991

采纳答案by rth

回答by Mohit Pandey

回答by Evgeny Remizov

回答by Sakhri Houssem

回答by PaxRomana99

相关推荐

最近更新

标签

Python 以追加模式保存 numpy 数组

提问by user3820991

采纳答案by rth

回答by Mohit Pandey

回答by Evgeny Remizov

回答by Sakhri Houssem

回答by PaxRomana99

相关推荐

Python Selenium 速度慢，还是我的代码错了？

合并多个 CSV 文件而不重复标题（使用 Python）

Python 熊猫可以自动识别日期吗？

Python Seaborn load_dataset

相关推荐

最近更新

标签