hdf5 文件到 Pandas 数据框

Question

提问by Graham Slick

I downloaded a dataset which is stored in .h5 files. I need to keep only certain columns and to be able to manipulate the data in it.

我下载了一个存储在 .h5 文件中的数据集。我只需要保留某些列并能够操作其中的数据。

To do this, I tried to load it in a pandas dataframe. I've tried to use:

为此，我尝试将其加载到 Pandas 数据框中。我试过使用：

pd.read_hdf(path)

But I get: No dataset in HDF5 file.

但我得到： No dataset in HDF5 file.

I've found answers on SO (read HDF5 file to pandas DataFrame with conditions) but I don't need conditions, and the answer adds conditions about how the file was written but I'm not the creator of the file so I can't do anything about that.

我在 SO 上找到了答案（使用条件将 HDF5 文件读取到 Pandas DataFrame）但我不需要条件，并且答案添加了关于文件如何写入的条件，但我不是文件的创建者，所以我可以'不要做任何事情。

I've also tried using h5py:

我也试过使用 h5py：

df = h5py.File(path)

But this is not easily manipulable and I can't seem to get the columns out of it (only the names of the columns using df.keys()) Any idea on how to do this ?

但这不容易操作，我似乎无法从中取出列（仅使用列的名称df.keys()）关于如何执行此操作的任何想法？

Answer 1

回答by drj

Pandas HDF support needs the HDF file to be formated very specifically. You can see https://stackoverflow.com/a/33644128/4128030for more info.

Pandas HDF 支持需要非常具体地格式化 HDF 文件。您可以查看https://stackoverflow.com/a/33644128/4128030了解更多信息。

Answer 2

回答by Ivan Mitevski

Easiest way to read them into Pandas is to convert into h5py, then np.array, and then into DataFrame. It would look something like:

将它们读入 Pandas 的最简单方法是转换为h5py，然后np.array，然后转换为DataFrame. 它看起来像：

df = pd.DataFrame(np.array(h5py.File(path)['variable_1']))

hdf5 文件到 Pandas 数据框

提问by Graham Slick

回答by drj

回答by Ivan Mitevski

相关推荐

最近更新

标签

hdf5 文件到 Pandas 数据框

提问by Graham Slick

回答by drj

回答by Ivan Mitevski

相关推荐

pandas 根据浮点列是否为整数（`float.is_integer`）在由True、False填充的pandas df中创建新列

pandas 熊猫四舍五入到最近的“n”

pandas 如何在 Python3 中解码编码文字/字符串的 numpy 数组？AttributeError: 'numpy.ndarray' 对象没有属性 'decode'

pandas 如何对一列进行熊猫分组操作，但将另一列保留在结果数据框中

相关推荐

最近更新

标签