Python pandas 使用 read_hdf 和 HDFStore.select 从 HDF5 文件中读取特定值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 
原文地址: http://stackoverflow.com/questions/26302480/
Warning: these are provided under cc-by-sa 4.0 license.  You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python pandas Reading specific values from HDF5 files using read_hdf and HDFStore.select
提问by ccsv
So I created hdf5 file with a simple dataset that looks like this
所以我用一个看起来像这样的简单数据集创建了 hdf5 文件
>>> pd.read_hdf('STORAGE2.h5', 'table')
   A  B
0  0  0
1  1  1
2  2  2
3  3  3
4  4  4
Using this script
使用这个脚本
import pandas as pd
import scipy as sp
from pandas.io.pytables import Term
store = pd.HDFStore('STORAGE2.h5')
df_tl = pd.DataFrame(dict(A=list(range(5)), B=list(range(5))))
df_tl.to_hdf('STORAGE2.h5','table',append=True)
I know I can select columns using
我知道我可以使用选择列
x = pd.read_hdf('STORAGE2.h5', 'table',  columns=['A'])
or
或者
x = store.select('table', where = 'columns=A')
How would I select all values in column 'A' that equals 3 or specific or indicies with strings in column 'A' like 'foo'? In pandas dataframes I would use df[df["A"]==3]or df[df["A"]=='foo']
我将如何选择“A”列中等于 3 的所有值或特定或带有“A”列中的字符串(如“foo”)的索引?在Pandas数据帧中,我会使用df[df["A"]==3]或df[df["A"]=='foo']
Also does it make a difference in efficiency if I use read_hdf()or store.select()?
如果我使用read_hdf()或,它也会对效率产生影响store.select()吗?
采纳答案by Jeff
You need to specify data_columns=(you can use Trueas well to make all columns searchable)
您需要指定data_columns=(您也可以使用True使所有列都可搜索)
(FYI, the mode='w'will start the file over, and is just for my example)
(仅供参考,mode='w'将重新启动文件,仅用于我的示例)
In [50]: df_tl.to_hdf('STORAGE2.h5','table',append=True,mode='w',data_columns=['A'])
In [51]: pd.read_hdf('STORAGE2.h5','table',where='A>2')
Out[51]: 
   A  B
3  3  3
4  4  4

