hdf5 文件到 Pandas 数据框

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/40472912/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:23:24  来源:igfitidea点击:

hdf5 file to pandas dataframe

pythonpandashdf5

提问by Graham Slick

I downloaded a dataset which is stored in .h5 files. I need to keep only certain columns and to be able to manipulate the data in it.

我下载了一个存储在 .h5 文件中的数据集。我只需要保留某些列并能够操作其中的数据。

To do this, I tried to load it in a pandas dataframe. I've tried to use:

为此,我尝试将其加载到 Pandas 数据框中。我试过使用:

pd.read_hdf(path)

But I get: No dataset in HDF5 file.

但我得到: No dataset in HDF5 file.

I've found answers on SO (read HDF5 file to pandas DataFrame with conditions) but I don't need conditions, and the answer adds conditions about how the file was written but I'm not the creator of the file so I can't do anything about that.

我在 SO 上找到了答案(使用条件将 HDF5 文件读取到 Pandas DataFrame)但我不需要条件,并且答案添加了关于文件如何写入的条件,但我不是文件的创建者,所以我可以'不要做任何事情。

I've also tried using h5py:

我也试过使用 h5py:

df = h5py.File(path)

But this is not easily manipulable and I can't seem to get the columns out of it (only the names of the columns using df.keys()) Any idea on how to do this ?

但这不容易操作,我似乎无法从中取出列(仅使用列的名称df.keys())关于如何执行此操作的任何想法?

回答by drj

Pandas HDF support needs the HDF file to be formated very specifically. You can see https://stackoverflow.com/a/33644128/4128030for more info.

Pandas HDF 支持需要非常具体地格式化 HDF 文件。您可以查看https://stackoverflow.com/a/33644128/4128030了解更多信息。

回答by Ivan Mitevski

Easiest way to read them into Pandas is to convert into h5py, then np.array, and then into DataFrame. It would look something like:

将它们读入 Pandas 的最简单方法是转换为h5py,然后np.array,然后转换为DataFrame. 它看起来像:

df = pd.DataFrame(np.array(h5py.File(path)['variable_1']))