pandas 腌制熊猫数据帧的最快方法是什么?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/28754658/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:59:23  来源:igfitidea点击:

What's the fastest way to pickle a pandas DataFrame?

pythonpandaspickle

提问by tegan

Which is better, using Pandas built-in method or pickle.dump?

哪个更好,使用 Pandas 内置方法还是使用pickle.dump

The standard pickle method looks like this:

标准的泡菜方法如下所示:

pickle.dump(my_dataframe, open('test_pickle.p', 'wb'))

The Pandas built-in method looks like this:

Pandas 内置方法如下所示:

my_dataframe.to_pickle('test_pickle.p')

回答by tegan

Thanks to @qwwqwwq I discovered that pandas has a built-in to_picklemethod for dataframes. I did a quick time test:

感谢@qwwqwwq,我发现pandas 有一个内置的to_pickle数据帧方法。我做了一个快速的时间测试:

In [1]: %timeit pickle.dump(df, open('test_pickle.p', 'wb'))
10 loops, best of 3: 91.8 ms per loop

In [2]: %timeit df.to_pickle('testpickle.p')
10 loops, best of 3: 88 ms per loop

So it seems that the built-in is only narrowly better (to me, this is useful because it means it's probably not worth refactoring code to use the built-in) - hope this helps someone!

所以似乎内置函数只是稍微好一点(对我来说,这很有用,因为这意味着使用内置函数可能不值得重构代码)-希望这对某人有所帮助!

回答by H4dr1en

Easy benchmark, right?

简单的基准测试,对吧?

enter image description here

在此处输入图片说明

Not difference at all, in fact I expect that Pandas implements getstateso that calling pickle.dump(df)is actually the same as calling df.to_pickle().

根本没有区别,事实上我希望 Pandas 实现getstate以便调用pickle.dump(df)实际上与调用相同df.to_pickle()

If you search for example __getstate__on the Pandas source code, you will find that it is implemented on several objects.

如果你__getstate__在 Pandas 源代码上搜索例如,你会发现它是在几个对象上实现的