高效地将大型 Pandas 数据帧写入磁盘

Question

提问by user2928791

I am trying to find the best way to efficiently write large data frames (250MB+) to and from disk using Python/Pandas. I've tried all of the methods in Python for Data Analysis, but the performance has been very disappointing.

我试图找到使用 Python/Pandas 将大数据帧（250MB+）高效写入磁盘和从磁盘写入的最佳方法。我已经尝试了Python 中用于数据分析的所有方法，但性能非常令人失望。

This is part of a larger project exploring migrating our current analytic/data management environment from Stata to Python. When I compare the read/write times in my tests to those that I get with Stata, Python and Pandas are typically taking more than 20 times as long.

这是探索将我们当前的分析/数据管理环境从 Stata 迁移到 Python 的更大项目的一部分。当我将测试中的读/写时间与我使用 Stata 获得的读/写时间进行比较时，Python 和 Pandas 通常需要 20 多倍的时间。

I strongly suspect that I am the problem, not Python or Pandas.

我强烈怀疑是我的问题，而不是 Python 或 Pandas。

Any suggestions?

有什么建议？

Answer 1

回答by Jeff

Using HDFStoreis your best bet (not covered very much in the book, and has changed quite a lot). You will find performance is MUCH better than any other serialization method.

使用HDFStore是你最好的选择（书中没有太多涉及，并且已经改变了很多）。您会发现性能比任何其他序列化方法都要好得多。

高效地将大型 Pandas 数据帧写入磁盘

提问by user2928791

回答by Jeff

相关推荐

最近更新

标签

高效地将大型 Pandas 数据帧写入磁盘

提问by user2928791

回答by Jeff

相关推荐

Pandas 导入 CSV 和 Excel 文件错误

pandas sort_values 和 sort_index 有什么区别？

如何为 Pandas 数据框的某些选定行集体设置多列的值？

pandas python中带有字符串列表的列

相关推荐

最近更新

标签