对 Pandas 和 HD5 / HDFStore 使用压缩
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 
原文地址: http://stackoverflow.com/questions/18274973/
Warning: these are provided under cc-by-sa 4.0 license.  You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Using compression with Pandas and HD5 / HDFStore
提问by TravisVOX
For a few aspects of a project, using "h5" storage would be ideal. However, the files are becoming massive and frankly we're running out of space.
对于项目的某些方面,使用“h5”存储将是理想的。然而,文件变得越来越大,坦率地说,我们的空间不足。
This statement...
这个说法...
 store.put(storekey, data, table=False, compression='gzip')
does not produce any difference in terms of file size than...
在文件大小方面不会产生任何差异...
 store.put(storekey, data, table=False)
Is using compression even possible when going through Pandas?
通过 Pandas 时甚至可以使用压缩吗?
... if it isn't possible, I don't mind using h5py, however, I'm uncertain what to put for a "datatype" as the DataFrame contains all sorts of types (strings, float, int etc.)
...如果不可能,我不介意使用 h5py,但是,我不确定为“数据类型”放置什么,因为 DataFrame 包含各种类型(字符串、浮点数、整数等)
Any help/insight would be appreciated!
任何帮助/见解将不胜感激!
回答by Jeff
see docsin regards to compression using HDFStore
请参阅有关使用压缩的文档HDFStore
gzipis not a valid compression option (and is ignored, that's a bug).
try any of zlib, bzip2, lzo, blosc(bzip2/lzo might need extra libraries installed)
gzip不是有效的压缩选项(并且被忽略,这是一个错误)。尝试任何一个zlib, bzip2, lzo, blosc(bzip2/lzo 可能需要安装额外的库)
see for PyTables docson the various compression
有关各种压缩的PyTables 文档,请参见
Heres a questionsemi-related.
这是一个半相关的问题。
回答by Quentin Stafford-Fraser
I've ben quite a fan of HDF5 in the past, but having hit a variety of complications, especially with Pandas HDFStore, I'm starting to think Exdir is a good idea.
过去我一直是 HDF5 的忠实粉丝,但遇到了各种复杂情况,尤其是 Pandas HDFStore,我开始认为 Exdir 是个好主意。

