加载速度更快：python中的pickle或hdf5

Question

提问by denvar

Given a 1.5 Gb list of pandas dataframes, which format is fastest for loading compressed data: pickle (via cPickle), hdf5, or something else in Python?

给定 1.5 Gb 的 pandas 数据帧列表，哪种格式加载压缩数据最快：pickle（通过 cPickle）、hdf5 或 Python 中的其他内容？

I only care about fastest speed to load the data into memory
I don't care about dumping the data, it's slow but I only do this once.
I don't care about file size on disk

我只关心将数据加载到内存中的最快速度
我不在乎转储数据，它很慢但我只这样做一次。
我不在乎磁盘上的文件大小

Answer 1

回答by MaxU

I would consider only two storage formats: HDF5 (PyTables) and Feather

我只考虑两种存储格式：HDF5 (PyTables) 和Feather

Here are results of my read and write comparisonfor the DF (shape: 4000000 x 6, size in memory 183.1 MB, size of uncompressed CSV - 492 MB).

这是我对 DF 的读写比较的结果（形状：4000000 x 6，内存大小 183.1 MB，未压缩 CSV 的大小 - 492 MB）。

Comparison for the following storage formats: (CSV, CSV.gzip, Pickle, HDF5[various compression]):

以下存储格式的比较：（CSV, CSV.gzip, Pickle, HDF5[各种压缩]）：

                  read_s  write_s  size_ratio_to_CSV
storage
CSV               17.900    69.00              1.000
CSV.gzip          18.900   186.00              0.047
Pickle             0.173     1.77              0.374
HDF_fixed          0.196     2.03              0.435
HDF_tab            0.230     2.60              0.437
HDF_tab_zlib_c5    0.845     5.44              0.035
HDF_tab_zlib_c9    0.860     5.95              0.035
HDF_tab_bzip2_c5   2.500    36.50              0.011
HDF_tab_bzip2_c9   2.500    36.50              0.011

But it might be different for you, because all my data was of the datetimedtype, so it's always better to make such a comparison with yourreal data or at least with the similar data...

但对你来说可能会有所不同，因为我所有的数据都是datetimedtype，所以最好与你的真实数据或至少与类似数据进行这样的比较......

加载速度更快：python中的pickle或hdf5

提问by denvar

回答by MaxU

相关推荐

最近更新

标签

加载速度更快：python中的pickle或hdf5

提问by denvar

回答by MaxU

相关推荐

Python Matplotlib：TypeError：'AxesSubplot' 对象不可下标

Python Pandas 系列：对数归一化

Python matplotlib 中的 figsize 没有改变图形大小？

Python 我无法在 Windows 上安装 pyaudio？如何解决“错误：需要 Microsoft Visual C++ 14.0。”？

相关推荐

最近更新

标签