pandas 熊猫无法读取用 h5py 创建的 hdf5 文件

Question

提问by Masha L.

I get pandas error when I try to read HDF5 format files that I have created with h5py. I wonder if I am just doing something wrong?

当我尝试读取我用 h5py 创建的 HDF5 格式文件时，我收到 pandas 错误。我想知道我是否只是做错了什么？

import h5py
import numpy as np
import pandas as pd
h5_file = h5py.File('test.h5', 'w')
h5_file.create_dataset('zeros', data=np.zeros(shape=(3, 5)), dtype='f')
h5_file.close()
pd_file = pd.read_hdf('test.h5', 'zeros')

gives an error: TypeError: cannot create a storer if the object is not existing nor a value are passed

给出错误：TypeError：如果对象不存在或传递值，则无法创建存储库

I tried to specify key set to '/zeros' (as I would do it with h5py when reading the file) with no luck.

我试图将键设置为“/zeros”（就像我在读取文件时使用 h5py 所做的那样）但没有运气。

If I use pandas.HDFStore to read it in, I get an empty store back:

如果我使用 pandas.HDFStore 来读取它，我会得到一个空的存储：

store = pd.HDFStore('test.h5')
>>> store
<class 'pandas.io.pytables.HDFStore'>
File path: test.h5
Empty

I have no troubles reading just created file back with h5py:

我用 h5py 读取刚刚创建的文件没有问题：

h5_back = h5py.File('test.h5', 'r')
h5_back['/zeros']
<HDF5 dataset "zeros": shape (3, 5), type "<f4">

Using these versions:

使用这些版本：

Python 3.4.3 (v3.4.3:9b73f1c3e601, Feb 23 2015, 02:52:03) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin

pd.__version__
'0.16.2'
h5py.__version__
'2.5.0'

Many thanks in advance, Masha

提前非常感谢，玛莎

Answer 1

回答by Kevin S

I've worked a little on the pytablesmodule in pandas.ioand from what I know pandas interaction with HDF files is limited to specific structures that pandas understands. To see what these look like, you can try

我在pytables模块上做了一些工作，据pandas.io我所知，pandas 与 HDF 文件的交互仅限于pandas 理解的特定结构。要查看这些看起来像什么，您可以尝试

import pandas as pd
import numpy as np
pd.Series(np.zeros((3,5),dtype=np.float32).to_hdf('test.h5','test')

If you open 'test.h5' in HDFView, you will see a path /testwith 4 items that are needed to recreate the DataFrame.

如果您在HDFView 中打开“test.h5” ，您将看到一个/test包含 4 个项目的路径，这些项目需要重新创建DataFrame.

So I think your only option for reading in NumPy arrays is to read them in directly and then convert these to Pandas objects.

所以我认为读取 NumPy 数组的唯一选择是直接读取它们，然后将它们转换为 Pandas 对象。

pandas 熊猫无法读取用 h5py 创建的 hdf5 文件

提问by Masha L.

回答by Kevin S

相关推荐

最近更新

标签

pandas 熊猫无法读取用 h5py 创建的 hdf5 文件

提问by Masha L.

回答by Kevin S

相关推荐

具有 MultiIndex 到 Numpy 矩阵的 Pandas DataFrame

pandas Python“接口错误：错误绑定参数 2 - 可能是不受支持的类型。”

在 Pandas 数据框布尔索引中使用“相反布尔值”的正确方法

获取箱线图的数据 - Pandas

相关推荐

最近更新

标签