pandas 腌制熊猫数据帧的最快方法是什么?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/28754658/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What's the fastest way to pickle a pandas DataFrame?
提问by tegan
Which is better, using Pandas built-in method or pickle.dump?
哪个更好,使用 Pandas 内置方法还是使用pickle.dump?
The standard pickle method looks like this:
标准的泡菜方法如下所示:
pickle.dump(my_dataframe, open('test_pickle.p', 'wb'))
The Pandas built-in method looks like this:
Pandas 内置方法如下所示:
my_dataframe.to_pickle('test_pickle.p')
回答by tegan
Thanks to @qwwqwwq I discovered that pandas has a built-in to_picklemethod for dataframes. I did a quick time test:
感谢@qwwqwwq,我发现pandas 有一个内置的to_pickle数据帧方法。我做了一个快速的时间测试:
In [1]: %timeit pickle.dump(df, open('test_pickle.p', 'wb'))
10 loops, best of 3: 91.8 ms per loop
In [2]: %timeit df.to_pickle('testpickle.p')
10 loops, best of 3: 88 ms per loop
So it seems that the built-in is only narrowly better (to me, this is useful because it means it's probably not worth refactoring code to use the built-in) - hope this helps someone!
所以似乎内置函数只是稍微好一点(对我来说,这很有用,因为这意味着使用内置函数可能不值得重构代码)-希望这对某人有所帮助!
回答by H4dr1en
Easy benchmark, right?
简单的基准测试,对吧?
Not difference at all, in fact I expect that Pandas implements getstateso that calling pickle.dump(df)is actually the same as calling df.to_pickle().
根本没有区别,事实上我希望 Pandas 实现getstate以便调用pickle.dump(df)实际上与调用相同df.to_pickle()。
If you search for example __getstate__on the Pandas source code, you will find that it is implemented on several objects.
如果你__getstate__在 Pandas 源代码上搜索例如,你会发现它是在几个对象上实现的。


