Python 如何在 HDF5 数据集中存储字典

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/16494669/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 22:49:09  来源:igfitidea点击:

How to store dictionary in HDF5 dataset

pythonh5py

提问by theta

I have a dictionary, where key is datetime object and value is tuple of integers:

我有一本字典,其中键是日期时间对象,值是整数元组:

>>> d.items()[0]
(datetime.datetime(2012, 4, 5, 23, 30), (14, 1014, 6, 3, 0))

I want to store it in HDF5 dataset, but if I try to just dump the dictionary h5py raises error:

我想将它存储在 HDF5 数据集中,但是如果我尝试只转储字典 h5py 会引发错误:

TypeError: Object dtype dtype('object') has no native HDF5 equivalent

TypeError: Object dtype dtype('object') 没有本地 HDF5 等效项

What would be "the best" way to transform this dictionary so that I can store it in HDF5 dataset?

转换此字典以便我可以将其存储在 HDF5 数据集中的“最佳”方法是什么?

Specifically I don't want to just dump the dictionary in numpy array, as it would complicate data retrieval based on datetime query.

具体来说,我不想将字典转储到 numpy 数组中,因为它会使基于日期时间查询的数据检索变得复杂。

回答by theta

I found two ways to this:

我找到了两种方法:

I)transform datetime object to string and use it as dataset name

I)将日期时间对象转换为字符串并将其用作数据集名称

h = h5py.File('myfile.hdf5')
for k, v in d.items():
    h.create_dataset(k.strftime('%Y-%m-%dT%H:%M:%SZ'), data=np.array(v, dtype=np.int8))

where data can be accessed by quering key strings (datasets name). For example:

可以通过查询键字符串(数据集名称)来访问数据。例如:

for ds in h.keys():
    if '2012-04' in ds:
        print(h[ds].value)

II)transform datetime object to dataset subgroups

II)将日期时间对象转换为数据集子组

h = h5py.File('myfile.hdf5')
for k, v in d.items():
    h.create_dataset(k.strftime('%Y/%m/%d/%H:%M'), data=np.array(v, dtype=np.int8))

notice forward slashes in strftime string, which will create appropriate subgroups in HDF file. Data can be accessed directly like h['2012']['04']['05']['23:30'].value, or by iterating with provided h5py iterators, or even by using custom functions through visititems()

注意 strftime 字符串中的正斜杠,这将在 HDF 文件中创建适当的子组。可以像 一样直接访问数据h['2012']['04']['05']['23:30'].value,或者通过使用提供的 h5py 迭代器进行迭代,甚至通过使用自定义函数visititems()

For simplicity I choose the first option.

为简单起见,我选择第一个选项。

回答by Jason S

I would serialize the object into JSON or YAML and store the resulting string as an attribute in the appropriate object (HDF5 group or dataset).

我会将对象序列化为 JSON 或 YAML,并将结果字符串作为属性存储在适当的对象(HDF5 组或数据集)中。

I'm not sure why you're using the datetime as a dataset name, however, unless you absolutely need to look up your dataset directly by datetime.

但是,我不确定您为什么使用日期时间作为数据集名称,除非您绝对需要按日期时间直接查找数据集。

p.s. For what it's worth, PyTables is a lot easier to use than the low-level h5py.

ps 就其价值而言,PyTables 比低级 h5py 更容易使用

回答by wordsforthewise

Nowadays we have deepdish (www.deepdish.io):

现在我们有 deepdish (www.deepdish.io):

import deepdish as dd
dd.io.save(filename, {'dict1': dict1, 'dict2': dict2}, compression=('blosc', 9))

回答by Ameet Deshpande

This question relates to the more general question of being able to store any type of dictionary in HDF5format. First, convert the dictionary to a string. Then to recover the dictionary, use the astlibrary by using the import astcommand. The following code gives an example.

这个问题与能够以HDF5格式存储任何类型的字典的更一般的问题有关。首先,将字典转换为字符串。然后要恢复字典,请使用astimport ast命令使用库。下面的代码给出了一个例子。

>>> d = {1:"a",2:"b"}
>>> s = str(d)
>>> s
"{1: 'a', 2: 'b'}"
>>> ast.literal_eval(s)
{1: 'a', 2: 'b'}
>>> type(ast.literal_eval(s))
<type 'dict'>