Python pandas 使用 read_hdf 和 HDFStore.select 从 HDF5 文件中读取特定值

Question

提问by ccsv

So I created hdf5 file with a simple dataset that looks like this

所以我用一个看起来像这样的简单数据集创建了 hdf5 文件

>>> pd.read_hdf('STORAGE2.h5', 'table')
   A  B
0  0  0
1  1  1
2  2  2
3  3  3
4  4  4

Using this script

使用这个脚本

import pandas as pd
import scipy as sp
from pandas.io.pytables import Term

store = pd.HDFStore('STORAGE2.h5')

df_tl = pd.DataFrame(dict(A=list(range(5)), B=list(range(5))))

df_tl.to_hdf('STORAGE2.h5','table',append=True)

I know I can select columns using

我知道我可以使用选择列

x = pd.read_hdf('STORAGE2.h5', 'table',  columns=['A'])

or

或者

x = store.select('table', where = 'columns=A')

How would I select all values in column 'A' that equals 3 or specific or indicies with strings in column 'A' like 'foo'? In pandas dataframes I would use df[df["A"]==3]or df[df["A"]=='foo']

我将如何选择“A”列中等于 3 的所有值或特定或带有“A”列中的字符串（如“foo”）的索引？在Pandas数据帧中，我会使用df[df["A"]==3]或df[df["A"]=='foo']

Also does it make a difference in efficiency if I use read_hdf()or store.select()?

如果我使用read_hdf()或，它也会对效率产生影响store.select()吗？

Answer 1

采纳答案by Jeff

You need to specify data_columns=(you can use Trueas well to make all columns searchable)

您需要指定data_columns=（您也可以使用True使所有列都可搜索）

(FYI, the mode='w'will start the file over, and is just for my example)

（仅供参考，mode='w'将重新启动文件，仅用于我的示例）

In [50]: df_tl.to_hdf('STORAGE2.h5','table',append=True,mode='w',data_columns=['A'])

In [51]: pd.read_hdf('STORAGE2.h5','table',where='A>2')
Out[51]: 
   A  B
3  3  3
4  4  4

Python pandas 使用 read_hdf 和 HDFStore.select 从 HDF5 文件中读取特定值

提问by ccsv

采纳答案by Jeff

相关推荐

最近更新

标签

Python pandas 使用 read_hdf 和 HDFStore.select 从 HDF5 文件中读取特定值

提问by ccsv

采纳答案by Jeff

相关推荐

pandas 无法使用系列内置函数对时间戳应用方法

沿每列计算 Pandas DataFrame 的自相关

将 Pandas 数据框的多列转换为虚拟变量 - Python

将 Pandas 中的 CSV 文件导入到 Pandas 数据框中

相关推荐

最近更新

标签