pandas 熊猫,使用 pd.to_hdf 将多个数据集存储在一个 h5 文件中
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/38268599/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas, store multiple datasets in an h5 file with pd.to_hdf
提问by atomsmasher
Say I have two dataframes,
假设我有两个数据框,
import pandas as pd
df1 = pd.DataFrame({'col1':[0,2,3,2],'col2':[1,0,0,1]})
df2 = pd.DataFrame({'col12':[0,1,2,1],'col22':[1,1,1,1]})
Now df1.to_hdf('nameoffile.h5', 'key_to_store','w',table=True)
successully stores df1
but I want to also store df2
to the same file, but If I try the same method then df1
will just be over written. When I try to load it and I check the keys I only see the info of df2
. How can I store both df1
and df2
in the same h5 file as a table ?
现在df1.to_hdf('nameoffile.h5', 'key_to_store','w',table=True)
成功存储df1
但我也想存储df2
到同一个文件,但是如果我尝试相同的方法,那么df1
只会被覆盖。当我尝试加载它并检查密钥时,我只看到df2
. 我怎么能同时存储df1
和df2
在同一个H5文件作为表?
回答by EdChum
You are using 'w'
which overwrites, by default the mode is 'a'
so you can just do:
您正在使用'w'
哪些覆盖,默认情况下该模式是'a'
这样您就可以执行以下操作:
df2.to_hdf('nameoffile.h5', 'key_to_store', table=True, mode='a')
Check the docs: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_hdf.html#pandas.DataFrame.to_hdf
检查文档:http: //pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_hdf.html#pandas.DataFrame.to_hdf
回答by Grr
I have used this in the past without issue:
我过去使用过这个没有问题:
store = pd.HDFStore(path_to_hdf)
store[new_df_name] = df2
store.close()
So in your case you could try:
因此,在您的情况下,您可以尝试:
store = pd.HDFStore(path_to_hdf)
store['df1'] = df1
store['df2'] = df2
store.close()
I used this in a system where a user could store layouts for microtiter plate experiments. The first time they saved a layout the hdf file was created and subsequent layouts could then be appended to the file.
我在一个系统中使用了它,用户可以在其中存储微量滴定板实验的布局。他们第一次保存布局时,会创建 hdf 文件,然后可以将后续布局附加到文件中。
N.B. I have set pd.set_option('io.hdf.default.format', 'table')
at the beginning of my program.
注意我已经pd.set_option('io.hdf.default.format', 'table')
在我的程序开始时设置了。