如何将 Pandas DataFrame 存储为 HDF5 PyTables 表(或 CArray、EArray 等)?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/38460744/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:37:00  来源:igfitidea点击:

How does one store a Pandas DataFrame as an HDF5 PyTables table (or CArray, EArray, etc.)?

pythonpandashdf5pytableshdfstore

提问by JianguoHisiang

I have the following pandas dataframe:

我有以下Pandas数据框:

import pandas as pd
df = pd.read_csv(filename.csv)

Now, I can use HDFStoreto write the dfobject to file (like adding key-value pairs to a Python dictionary):

现在,我可以使用HDFStoredf对象写入文件(例如将键值对添加到 Python 字典):

store = HDFStore('store.h5')
store['df'] = df

http://pandas.pydata.org/pandas-docs/stable/io.html

http://pandas.pydata.org/pandas-docs/stable/io.html

When I look at the contents, this object is a frame.

当我查看内容时,这个对象是一个frame.

store 

outputs

输出

<class 'pandas.io.pytables.HDFStore'>
File path: store.h5
/df            frame        (shape->[552,23252])

However, in order to use indexing, one should store this as a tableobject.

但是,为了使用索引,应该将其存储为table对象。

My approach was to try HDFStore.put()i.e.

我的方法是尝试HDFStore.put()

HDFStore.put(key="store.h", value=df, format=Table)

However, this fails with the error:

但是,这失败并出现错误:

TypeError: put() missing 1 required positional argument: 'self'

How does one save Pandas Dataframes as PyTables tables?

如何将 Pandas Dataframes 保存为 PyTables 表?

回答by MaxU

common part - create or open existing HDFStore file:

公共部分 - 创建或打开现有的 HDFStore 文件:

store = pd.HDFStore('store.h5')

Try this if you want to have indexed allcolumns:

如果你想索引所有列,试试这个:

store.append('key_name', df, data_columns=True)

or this if you want to have indexed just a subset of columns:

或者,如果您只想索引列的子集:

store.append('key_name', df, data_columns=['colA','colC','colN'])

PS HDFStore.append()saves DFs per default in tableformat

PSHDFStore.append()以默认table格式保存 DF

回答by miraculixx

How does one save Pandas Dataframes as PyTables tables?

如何将 Pandas Dataframes 保存为 PyTables 表?

Adding to the accepted answer, you should always close the PyTable file. For convenience, Pandas provides the HDFStore as a context manager:

添加到已接受的答案中,您应该始终关闭 PyTable 文件。为方便起见,Pandas 提供 HDFStore 作为上下文管理器:

with pd.HDFStore('/path/to/data.hdf') as hdf:
   hdf.put(key="store.h", value=df, format='table', data_columns=True)