如何将 Pandas DataFrame 存储为 HDF5 PyTables 表(或 CArray、EArray 等)?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/38460744/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How does one store a Pandas DataFrame as an HDF5 PyTables table (or CArray, EArray, etc.)?
提问by JianguoHisiang
I have the following pandas dataframe:
我有以下Pandas数据框:
import pandas as pd
df = pd.read_csv(filename.csv)
Now, I can use HDFStore
to write the df
object to file (like adding key-value pairs to a Python dictionary):
现在,我可以使用HDFStore
将df
对象写入文件(例如将键值对添加到 Python 字典):
store = HDFStore('store.h5')
store['df'] = df
http://pandas.pydata.org/pandas-docs/stable/io.html
http://pandas.pydata.org/pandas-docs/stable/io.html
When I look at the contents, this object is a frame
.
当我查看内容时,这个对象是一个frame
.
store
outputs
输出
<class 'pandas.io.pytables.HDFStore'>
File path: store.h5
/df frame (shape->[552,23252])
However, in order to use indexing, one should store this as a table
object.
但是,为了使用索引,应该将其存储为table
对象。
My approach was to try HDFStore.put()
i.e.
我的方法是尝试HDFStore.put()
即
HDFStore.put(key="store.h", value=df, format=Table)
However, this fails with the error:
但是,这失败并出现错误:
TypeError: put() missing 1 required positional argument: 'self'
How does one save Pandas Dataframes as PyTables tables?
如何将 Pandas Dataframes 保存为 PyTables 表?
回答by MaxU
common part - create or open existing HDFStore file:
公共部分 - 创建或打开现有的 HDFStore 文件:
store = pd.HDFStore('store.h5')
Try this if you want to have indexed allcolumns:
如果你想索引所有列,试试这个:
store.append('key_name', df, data_columns=True)
or this if you want to have indexed just a subset of columns:
或者,如果您只想索引列的子集:
store.append('key_name', df, data_columns=['colA','colC','colN'])
PS HDFStore.append()
saves DFs per default in table
format
PSHDFStore.append()
以默认table
格式保存 DF
回答by miraculixx
How does one save Pandas Dataframes as PyTables tables?
如何将 Pandas Dataframes 保存为 PyTables 表?
Adding to the accepted answer, you should always close the PyTable file. For convenience, Pandas provides the HDFStore as a context manager:
添加到已接受的答案中,您应该始终关闭 PyTable 文件。为方便起见,Pandas 提供 HDFStore 作为上下文管理器:
with pd.HDFStore('/path/to/data.hdf') as hdf:
hdf.put(key="store.h", value=df, format='table', data_columns=True)