Pytables 表转换为 Pandas DataFrame
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/12924264/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pytables table into pandas DataFrame
提问by Jim Knoll
Lots of information on how to read a csv into a pandas dataframe, but I what I have is a pyTable table and want a pandas DataFrame.
关于如何将 csv 读入 Pandas 数据帧的大量信息,但我拥有的是一个 pyTable 表并想要一个 Pandas DataFrame。
I've found how to store my pandas DataFrame topytables... then read I want to read it back, at this point it will have:
我已经找到了如何将我的 Pandas DataFrame存储到pytables ......然后阅读我想读回它,此时它将具有:
"kind = v._v_attrs.pandas_type"
I could write it out as csv and re-read it in but that seems silly. It is what I am doing for now.
我可以将它写成 csv 并重新读取它,但这似乎很愚蠢。这就是我现在正在做的事情。
How should I be reading pytable objects into pandas?
我应该如何将 pytable 对象读入Pandas?
回答by meteore
import tables as pt
import pandas as pd
import numpy as np
# the content is junk but we don't care
grades = np.empty((10,2), dtype=(('name', 'S20'), ('grade', 'u2')))
# write to a PyTables table
handle = pt.openFile('/tmp/test_pandas.h5', 'w')
handle.createTable('/', 'grades', grades)
print handle.root.grades[:].dtype # it is a structured array
# load back as a DataFrame and check types
df = pd.DataFrame.from_records(handle.root.grades[:])
df.dtypes
Beware that your u2 (unsigned 2-byte integer) will end as an i8 (integer 8 byte), and the strings will be objects, because Pandas does not yet support the full range of dtypes that are available for Numpy arrays.
请注意,您的 u2(无符号 2 字节整数)将以 i8(整数 8 字节)结尾,并且字符串将是对象,因为 Pandas 尚不支持可用于 Numpy 数组的全部 dtype。
回答by Andy Hayden
The docs now include an excellent section on using the HDF5 storeand there are some more advanced strategies discussed in the cookbook.
该文档现在包括一个优秀的部分使用HDF5店,并有在讨论一些更高级的策略食谱。
It's now relatively straightforward:
现在相对简单了:
In [1]: store = HDFStore('store.h5')
In [2]: print store
<class 'pandas.io.pytables.HDFStore'>
File path: store.h5
Empty
In [3]: df = DataFrame([[1, 2], [3, 4]], columns=['A', 'B'])
In [4]: store['df'] = df
In [5]: store
<class 'pandas.io.pytables.HDFStore'>
File path: store.h5
/df frame (shape->[2,2])
And to retrieve from HDF5/pytables:
并从 HDF5/pytables 中检索:
In [6]: store['df'] # store.get('df') is an equivalent
Out[6]:
A B
0 1 2
1 3 4
You can also query within a table.
您还可以在表中进行查询。

