对 Pandas 和 HD5 / HDFStore 使用压缩

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/18274973/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 21:05:43  来源:igfitidea点击:

Using compression with Pandas and HD5 / HDFStore

pythonpandashdf5

提问by TravisVOX

For a few aspects of a project, using "h5" storage would be ideal. However, the files are becoming massive and frankly we're running out of space.

对于项目的某些方面,使用“h5”存储将是理想的。然而,文件变得越来越大,坦率地说,我们的空间不足。

This statement...

这个说法...

 store.put(storekey, data, table=False, compression='gzip')

does not produce any difference in terms of file size than...

在文件大小方面不会产生任何差异...

 store.put(storekey, data, table=False)

Is using compression even possible when going through Pandas?

通过 Pandas 时甚至可以使用压缩吗?

... if it isn't possible, I don't mind using h5py, however, I'm uncertain what to put for a "datatype" as the DataFrame contains all sorts of types (strings, float, int etc.)

...如果不可能,我不介意使用 h5py,但是,我不确定为“数据类型”放置什么,因为 DataFrame 包含各种类型(字符串、浮点数、整数等)

Any help/insight would be appreciated!

任何帮助/见解将不胜感激!

回答by Jeff

see docsin regards to compression using HDFStore

请参阅有关使用压缩的文档HDFStore

gzipis not a valid compression option (and is ignored, that's a bug). try any of zlib, bzip2, lzo, blosc(bzip2/lzo might need extra libraries installed)

gzip不是有效的压缩选项(并且被忽略,这是一个错误)。尝试任何一个zlib, bzip2, lzo, blosc(bzip2/lzo 可能需要安装额外的库)

see for PyTables docson the various compression

有关各种压缩的PyTables 文档,请参见

Heres a questionsemi-related.

这是一个半相关的问题

回答by Quentin Stafford-Fraser

I've ben quite a fan of HDF5 in the past, but having hit a variety of complications, especially with Pandas HDFStore, I'm starting to think Exdir is a good idea.

过去我一直是 HDF5 的忠实粉丝,但遇到了各种复杂情况,尤其是 Pandas HDFStore,我开始认为 Exdir 是个好主意。

http://exdir.readthedocs.io

http://exdir.readthedocs.io