Python 如何在 HDF5 数据集中存储字典

Question

提问by theta

I have a dictionary, where key is datetime object and value is tuple of integers:

我有一本字典，其中键是日期时间对象，值是整数元组：

>>> d.items()[0]
(datetime.datetime(2012, 4, 5, 23, 30), (14, 1014, 6, 3, 0))

I want to store it in HDF5 dataset, but if I try to just dump the dictionary h5py raises error:

我想将它存储在 HDF5 数据集中，但是如果我尝试只转储字典 h5py 会引发错误：

TypeError: Object dtype dtype('object') has no native HDF5 equivalent

TypeError: Object dtype dtype('object') 没有本地 HDF5 等效项

What would be "the best" way to transform this dictionary so that I can store it in HDF5 dataset?

转换此字典以便我可以将其存储在 HDF5 数据集中的“最佳”方法是什么？

Specifically I don't want to just dump the dictionary in numpy array, as it would complicate data retrieval based on datetime query.

具体来说，我不想将字典转储到 numpy 数组中，因为它会使基于日期时间查询的数据检索变得复杂。

Answer 1

回答by theta

I found two ways to this:

我找到了两种方法：

I)transform datetime object to string and use it as dataset name

I）将日期时间对象转换为字符串并将其用作数据集名称

h = h5py.File('myfile.hdf5')
for k, v in d.items():
    h.create_dataset(k.strftime('%Y-%m-%dT%H:%M:%SZ'), data=np.array(v, dtype=np.int8))

where data can be accessed by quering key strings (datasets name). For example:

可以通过查询键字符串（数据集名称）来访问数据。例如：

for ds in h.keys():
    if '2012-04' in ds:
        print(h[ds].value)

II)transform datetime object to dataset subgroups

II)将日期时间对象转换为数据集子组

h = h5py.File('myfile.hdf5')
for k, v in d.items():
    h.create_dataset(k.strftime('%Y/%m/%d/%H:%M'), data=np.array(v, dtype=np.int8))

notice forward slashes in strftime string, which will create appropriate subgroups in HDF file. Data can be accessed directly like h['2012']['04']['05']['23:30'].value, or by iterating with provided h5py iterators, or even by using custom functions through visititems()

注意 strftime 字符串中的正斜杠，这将在 HDF 文件中创建适当的子组。可以像一样直接访问数据h['2012']['04']['05']['23:30'].value，或者通过使用提供的 h5py 迭代器进行迭代，甚至通过使用自定义函数visititems()

For simplicity I choose the first option.

为简单起见，我选择第一个选项。

Answer 2

回答by Jason S

I would serialize the object into JSON or YAML and store the resulting string as an attribute in the appropriate object (HDF5 group or dataset).

我会将对象序列化为 JSON 或 YAML，并将结果字符串作为属性存储在适当的对象（HDF5 组或数据集）中。

I'm not sure why you're using the datetime as a dataset name, however, unless you absolutely need to look up your dataset directly by datetime.

但是，我不确定您为什么使用日期时间作为数据集名称，除非您绝对需要按日期时间直接查找数据集。

p.s. For what it's worth, PyTables is a lot easier to use than the low-level h5py.

ps 就其价值而言，PyTables 比低级 h5py 更容易使用。

Answer 3

回答by wordsforthewise

Nowadays we have deepdish (www.deepdish.io):

现在我们有 deepdish (www.deepdish.io)：

import deepdish as dd
dd.io.save(filename, {'dict1': dict1, 'dict2': dict2}, compression=('blosc', 9))

Answer 4

回答by Ameet Deshpande

This question relates to the more general question of being able to store any type of dictionary in HDF5format. First, convert the dictionary to a string. Then to recover the dictionary, use the astlibrary by using the import astcommand. The following code gives an example.

这个问题与能够以HDF5格式存储任何类型的字典的更一般的问题有关。首先，将字典转换为字符串。然后要恢复字典，请使用ast该import ast命令使用库。下面的代码给出了一个例子。

>>> d = {1:"a",2:"b"}
>>> s = str(d)
>>> s
"{1: 'a', 2: 'b'}"
>>> ast.literal_eval(s)
{1: 'a', 2: 'b'}
>>> type(ast.literal_eval(s))
<type 'dict'>

Python 如何在 HDF5 数据集中存储字典

提问by theta

回答by theta

回答by Jason S

回答by wordsforthewise

回答by Ameet Deshpande

相关推荐

最近更新

标签

Python 如何在 HDF5 数据集中存储字典

提问by theta

回答by theta

回答by Jason S

回答by wordsforthewise

回答by Ameet Deshpande

相关推荐

Python matplotlib 中的堆栈条形图并为每个部分添加标签（和建议）

python从文件中读取单行作为字符串

Python 子进程更改目录

其他语法错误 Python

相关推荐

最近更新

标签