如何在 Python 中读取 HDF5 文件

Question

提问by Sameer Damir

I am trying to read data from hdf5 file in Python. I can read the hdf5 file using h5py, but I cannot figure out how to access data within the file.

我正在尝试从 Python 中的 hdf5 文件中读取数据。我可以使用读取 hdf5 文件h5py，但我无法弄清楚如何访问文件中的数据。

My code

我的代码

import h5py    
import numpy as np    
f1 = h5py.File(file_name,'r+')

This works and the file is read. But how can I access data inside the file object f1?

这有效并且文件被读取。但是如何访问文件对象内的数据f1？

Answer 1

采纳答案by Martin Thoma

Read HDF5

读取 HDF5

import h5py
filename = "file.hdf5"

with h5py.File(filename, "r") as f:
    # List all groups
    print("Keys: %s" % f.keys())
    a_group_key = list(f.keys())[0]

    # Get the data
    data = list(f[a_group_key])

Write HDF5

写入 HDF5

import h5py

# Create random data
import numpy as np
data_matrix = np.random.uniform(-1, 1, size=(10, 3))

# Write data to HDF5
with h5py.File("file.hdf5", "w") as data_file:
    data_file.create_dataset("group_name", data=data_matrix)

See h5py docsfor more information.

有关更多信息，请参阅h5py 文档。

Alternatives

备择方案

JSON: Nice for writing human-readable data; VERY commonly used (read & write)
CSV: Super simple format (read & write)
pickle: A Python serialization format (read & write)
MessagePack(Python package): More compact representation (read & write)
HDF5(Python package): Nice for matrices (read & write)
XML: exists too *sigh* (read& write)

JSON：非常适合编写人类可读的数据；非常常用（读写）
CSV：超级简单的格式（读写）
pickle：一种 Python 序列化格式（读写）
MessagePack（Python 包）：更紧凑的表示（读写）
HDF5（Python 包）：非常适合矩阵（读写）
XML: 也存在 *sigh* ( read& write)

For your application, the following might be important:

对于您的应用程序，以下内容可能很重要：

Support by other programming languages
Reading / writing performance
Compactness (file size)

其他编程语言的支持
读/写性能
紧凑性（文件大小）

See also: Comparison of data serialization formats

另请参阅：数据序列化格式的比较

In case you are rather looking for a way to make configuration files, you might want to read my short article Configuration files in Python

如果您正在寻找一种制作配置文件的方法，您可能需要阅读我的短文Python 中的配置文件

Answer 2

回答by Danny

you can use Pandas.

你可以使用熊猫。

import pandas as pd
pd.read_hdf(filename,key)

Answer 3

回答by Games Brainiac

What you need to do is create a dataset. If you take a look at the quickstart guide, it shows you that you need to use the file object in order to create a dataset. So, f.create_datasetand then you can read the data. This is explained in the docs.

您需要做的是创建一个数据集。如果您查看快速入门指南，它会告诉您需要使用文件对象来创建数据集。所以，f.create_dataset然后你就可以读取数据了。这在文档中进行了解释。

Answer 4

回答by Daksh

Reading the file

读取文件

import h5py

f = h5py.File(file_name, mode)

Studying the structure of the file by printing what HDF5 groups are present

通过打印存在的 HDF5 组来研究文件的结构

for key in f.keys():
    print(key) #Names of the groups in HDF5 file.

Extracting the data

提取数据

#Get the HDF5 group
group = f[key]

#Checkout what keys are inside that group.
for key in group.keys():
    print(key)

data = group[some_key_inside_the_group].value
#Do whatever you want with data

#After you are done
f.close()

Answer 5

回答by ashish bansal

Use below code to data read and convert into numpy array

使用以下代码读取数据并转换为 numpy 数组

import h5py
f1 = h5py.File('data_1.h5', 'r')
list(f1.keys())
X1 = f1['x']
y1=f1['y']
df1= np.array(X1.value)
dfy1= np.array(y1.value)
print (df1.shape)
print (dfy1.shape)

Answer 6

回答by Raza

To read the content of .hdf5 file as an array, you can do something as follow

要将 .hdf5 文件的内容作为数组读取，您可以执行以下操作

> import numpy as np 
> myarray = np.fromfile('file.hdf5', dtype=float)
> print(myarray)

Answer 7

回答by Attila

Here's a simple function I just wrote which reads a .hdf5 file generated by the save_weights function in keras and returns a dict with layer names and weights:

这是我刚刚编写的一个简单函数，它读取由 keras 中的 save_weights 函数生成的 .hdf5 文件，并返回一个带有层名称和权重的字典：

def read_hdf5(path):

    weights = {}

    keys = []
    with h5py.File(path, 'r') as f: # open file
        f.visit(keys.append) # append all keys to list
        for key in keys:
            if ':' in key: # contains data if ':' in key
                print(f[key].name)
                weights[f[key].name] = f[key].value
    return weights

https://gist.github.com/Attila94/fb917e03b04035f3737cc8860d9e9f9b.

https://gist.github.com/Attila94/fb917e03b04035f3737cc8860d9e9f9b。

Haven't tested it thoroughly but does the job for me.

尚未对其进行彻底测试，但对我有用。

Answer 8

回答by Judice

from keras.models import load_model 

h= load_model('FILE_NAME.h5')

Answer 9

回答by Patol75

Using bits of answers from this question and the latest doc, I was able to extract my numerical arrays using

使用这个问题和最新文档中的一些答案，我能够使用

import h5py
with h5py.File(filename, 'r') as h5f:
    h5x = h5f[list(h5f.keys())[0]]['x'][()]

Where 'x'is simply the X coordinate in my case.

'x'在我的例子中，哪里只是 X 坐标。

如何在 Python 中读取 HDF5 文件

提问by Sameer Damir

My code

我的代码

采纳答案by Martin Thoma

Read HDF5

读取 HDF5

Write HDF5

写入 HDF5

Alternatives

备择方案

回答by Danny

回答by Games Brainiac

回答by Daksh

回答by ashish bansal

回答by Raza

回答by Attila

回答by Judice

回答by Patol75

相关推荐

最近更新

标签

如何在 Python 中读取 HDF5 文件

提问by Sameer Damir

My code

我的代码

采纳答案by Martin Thoma

Read HDF5

读取 HDF5

Write HDF5

写入 HDF5

Alternatives

备择方案

回答by Danny

回答by Games Brainiac

回答by Daksh

回答by ashish bansal

回答by Raza

回答by Attila

回答by Judice

回答by Patol75

相关推荐

Python 仅选择多索引 DataFrame 的一个索引

Python Pymongo 多更新查询

Python 导入错误没有名为 pyaudio 的模块

如何在python中将输入数字转换为百分比

相关推荐

最近更新

标签