hdf5 文件到 Pandas 数据框
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/40472912/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
hdf5 file to pandas dataframe
提问by Graham Slick
I downloaded a dataset which is stored in .h5 files. I need to keep only certain columns and to be able to manipulate the data in it.
我下载了一个存储在 .h5 文件中的数据集。我只需要保留某些列并能够操作其中的数据。
To do this, I tried to load it in a pandas dataframe. I've tried to use:
为此,我尝试将其加载到 Pandas 数据框中。我试过使用:
pd.read_hdf(path)
But I get: No dataset in HDF5 file.
但我得到: No dataset in HDF5 file.
I've found answers on SO (read HDF5 file to pandas DataFrame with conditions) but I don't need conditions, and the answer adds conditions about how the file was written but I'm not the creator of the file so I can't do anything about that.
我在 SO 上找到了答案(使用条件将 HDF5 文件读取到 Pandas DataFrame)但我不需要条件,并且答案添加了关于文件如何写入的条件,但我不是文件的创建者,所以我可以'不要做任何事情。
I've also tried using h5py:
我也试过使用 h5py:
df = h5py.File(path)
But this is not easily manipulable and I can't seem to get the columns out of it (only the names of the columns using df.keys()
)
Any idea on how to do this ?
但这不容易操作,我似乎无法从中取出列(仅使用列的名称df.keys()
)关于如何执行此操作的任何想法?
回答by drj
Pandas HDF support needs the HDF file to be formated very specifically. You can see https://stackoverflow.com/a/33644128/4128030for more info.
Pandas HDF 支持需要非常具体地格式化 HDF 文件。您可以查看https://stackoverflow.com/a/33644128/4128030了解更多信息。
回答by Ivan Mitevski
Easiest way to read them into Pandas is to convert into h5py
, then np.array
, and then into DataFrame
. It would look something like:
将它们读入 Pandas 的最简单方法是转换为h5py
,然后np.array
,然后转换为DataFrame
. 它看起来像:
df = pd.DataFrame(np.array(h5py.File(path)['variable_1']))