高效地将大型 Pandas 数据帧写入磁盘
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/19639596/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Efficiently writing large Pandas data frames to disk
提问by user2928791
I am trying to find the best way to efficiently write large data frames (250MB+) to and from disk using Python/Pandas. I've tried all of the methods in Python for Data Analysis, but the performance has been very disappointing.
我试图找到使用 Python/Pandas 将大数据帧(250MB+)高效写入磁盘和从磁盘写入的最佳方法。我已经尝试了Python 中用于数据分析的所有方法,但性能非常令人失望。
This is part of a larger project exploring migrating our current analytic/data management environment from Stata to Python. When I compare the read/write times in my tests to those that I get with Stata, Python and Pandas are typically taking more than 20 times as long.
这是探索将我们当前的分析/数据管理环境从 Stata 迁移到 Python 的更大项目的一部分。当我将测试中的读/写时间与我使用 Stata 获得的读/写时间进行比较时,Python 和 Pandas 通常需要 20 多倍的时间。
I strongly suspect that I am the problem, not Python or Pandas.
我强烈怀疑是我的问题,而不是 Python 或 Pandas。
Any suggestions?
有什么建议?
回答by Jeff
Using HDFStoreis your best bet (not covered very much in the book, and has changed quite a lot). You will find performance is MUCH better than any other serialization method.
使用HDFStore是你最好的选择(书中没有太多涉及,并且已经改变了很多)。您会发现性能比任何其他序列化方法都要好得多。

