pandas 熊猫无法读取用 h5py 创建的 hdf5 文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/33641246/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:12:58  来源:igfitidea点击:

Pandas can't read hdf5 file created with h5py

pythonpandashdf5h5py

提问by Masha L.

I get pandas error when I try to read HDF5 format files that I have created with h5py. I wonder if I am just doing something wrong?

当我尝试读取我用 h5py 创建的 HDF5 格式文件时,我收到 pandas 错误。我想知道我是否只是做错了什么?

import h5py
import numpy as np
import pandas as pd
h5_file = h5py.File('test.h5', 'w')
h5_file.create_dataset('zeros', data=np.zeros(shape=(3, 5)), dtype='f')
h5_file.close()
pd_file = pd.read_hdf('test.h5', 'zeros')

gives an error: TypeError: cannot create a storer if the object is not existing nor a value are passed

给出错误:TypeError:如果对象不存在或传递值,则无法创建存储库

I tried to specify key set to '/zeros' (as I would do it with h5py when reading the file) with no luck.

我试图将键设置为“/zeros”(就像我在读取文件时使用 h5py 所做的那样)但没有运气。

If I use pandas.HDFStore to read it in, I get an empty store back:

如果我使用 pandas.HDFStore 来读取它,我会得到一个空的存储:

store = pd.HDFStore('test.h5')
>>> store
<class 'pandas.io.pytables.HDFStore'>
File path: test.h5
Empty

I have no troubles reading just created file back with h5py:

我用 h5py 读取刚刚创建的文件没有问题:

h5_back = h5py.File('test.h5', 'r')
h5_back['/zeros']
<HDF5 dataset "zeros": shape (3, 5), type "<f4">

Using these versions:

使用这些版本:

Python 3.4.3 (v3.4.3:9b73f1c3e601, Feb 23 2015, 02:52:03) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin

pd.__version__
'0.16.2'
h5py.__version__
'2.5.0'

Many thanks in advance, Masha

提前非常感谢,玛莎

回答by Kevin S

I've worked a little on the pytablesmodule in pandas.ioand from what I know pandas interaction with HDF files is limited to specific structures that pandas understands. To see what these look like, you can try

我在pytables模块上做了一些工作,据pandas.io我所知,pandas 与 HDF 文件的交互仅限于pandas 理解的特定结构。要查看这些看起来像什么,您可以尝试

import pandas as pd
import numpy as np
pd.Series(np.zeros((3,5),dtype=np.float32).to_hdf('test.h5','test')

If you open 'test.h5' in HDFView, you will see a path /testwith 4 items that are needed to recreate the DataFrame.

如果您在HDFView 中打开“test.h5” ,您将看到一个/test包含 4 个项目的路径,这些项目需要重新创建DataFrame.

HDFView of test.h5

test.h5 的 HDFView

So I think your only option for reading in NumPy arrays is to read them in directly and then convert these to Pandas objects.

所以我认为读取 NumPy 数组的唯一选择是直接读取它们,然后将它们转换为 Pandas 对象。