Python 输入和输出 numpy 数组到 h5py

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/20928136/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 21:34:45  来源:igfitidea点击:

Input and output numpy arrays to h5py

pythonarraysnumpyh5py

提问by lovespeed

I have a Python code whose output is a enter image description heresized matrix, whose entries are all of the type float. If I save it with the extension .datthe file size is of the order of 500 MB. I read that using h5pyreduces the file size considerably. So, let's say I have the 2D numpy array named A. How do I save it to an h5py file? Also, how do I read the same file and put it as a numpy array in a different code, as I need to do manipulations with the array?

我有一个 Python 代码,其输出是一个在此处输入图片说明大小矩阵,其条目都是float. 如果我用扩展名保存它,.dat文件大小约为 500 MB。我读到使用h5py可以大大减少文件大小。所以,假设我有一个名为A. 如何将其保存到 h5py 文件?另外,我如何读取同一个文件并将其作为 numpy 数组放在不同的代码中,因为我需要对数组进行操作?

回答by JoshAdel

h5py provides a model of datasetsand groups. The former is basically arrays and the latter you can think of as directories. Each is named. You should look at the documentation for the API and examples:

h5py 提供了数据集的模型。前者基本上是数组,而后者你可以认为是目录。每一个都有名字。您应该查看 API 和示例的文档:

http://docs.h5py.org/en/latest/quick.html

http://docs.h5py.org/en/latest/quick.html

A simple example where you are creating all of the data upfront and just want to save it to an hdf5 file would look something like:

一个简单的示例,您预先创建所有数据并只想将其保存到 hdf5 文件中,如下所示:

In [1]: import numpy as np
In [2]: import h5py
In [3]: a = np.random.random(size=(100,20))
In [4]: h5f = h5py.File('data.h5', 'w')
In [5]: h5f.create_dataset('dataset_1', data=a)
Out[5]: <HDF5 dataset "dataset_1": shape (100, 20), type "<f8">

In [6]: h5f.close()

You can then load that data back in using: '

然后,您可以使用以下命令重新加载该数据:'

In [10]: h5f = h5py.File('data.h5','r')
In [11]: b = h5f['dataset_1'][:]
In [12]: h5f.close()

In [13]: np.allclose(a,b)
Out[13]: True

Definitely check out the docs:

一定要查看文档:

http://docs.h5py.org

http://docs.h5py.org

Writing to hdf5 file depends either on h5py or pytables (each has a different python API that sits on top of the hdf5 file specification). You should also take a look at other simple binary formats provided by numpy natively such as np.save, np.savezetc:

写入 hdf5 文件取决于 h5py 或 pytables(每个都有不同的 python API,位于 hdf5 文件规范之上)。你也应该看看通过numpy的原生提供,如其他简单的二进制格式np.savenp.savez等等:

http://docs.scipy.org/doc/numpy/reference/routines.io.html

http://docs.scipy.org/doc/numpy/reference/routines.io.html

回答by Lavi Avigdor

A cleaner wayto handle file open/close and avoid memory leaks:

一个清洁的方式来处理文件打开/关闭,避免内存泄漏:

Prep:

准备:

import numpy as np
import h5py

data_to_write = np.random.random(size=(100,20)) # or some such

Write:

写:

with h5py.File('name-of-file.h5', 'w') as hf:
    hf.create_dataset("name-of-dataset",  data=data_to_write)

Read:

读:

with h5py.File('name-of-file.h5', 'r') as hf:
    data = hf['name-of-dataset'][:]

回答by Oscar Rangel

The withstatement of python takes care of closing all the handles and cleans the gc memory.

withpython的语句负责关闭所有句柄并清理gc内存。