pandas 腌制熊猫数据帧的最快方法是什么？

Question

提问by tegan

Which is better, using Pandas built-in method or pickle.dump?

哪个更好，使用 Pandas 内置方法还是使用pickle.dump？

The standard pickle method looks like this:

标准的泡菜方法如下所示：

pickle.dump(my_dataframe, open('test_pickle.p', 'wb'))

The Pandas built-in method looks like this:

Pandas 内置方法如下所示：

my_dataframe.to_pickle('test_pickle.p')

Answer 1

回答by tegan

Thanks to @qwwqwwq I discovered that pandas has a built-in to_picklemethod for dataframes. I did a quick time test:

感谢@qwwqwwq，我发现pandas 有一个内置的to_pickle数据帧方法。我做了一个快速的时间测试：

In [1]: %timeit pickle.dump(df, open('test_pickle.p', 'wb'))
10 loops, best of 3: 91.8 ms per loop

In [2]: %timeit df.to_pickle('testpickle.p')
10 loops, best of 3: 88 ms per loop

So it seems that the built-in is only narrowly better (to me, this is useful because it means it's probably not worth refactoring code to use the built-in) - hope this helps someone!

所以似乎内置函数只是稍微好一点（对我来说，这很有用，因为这意味着使用内置函数可能不值得重构代码）-希望这对某人有所帮助！

Answer 2

回答by H4dr1en

Easy benchmark, right?

简单的基准测试，对吧？

Not difference at all, in fact I expect that Pandas implements getstateso that calling pickle.dump(df)is actually the same as calling df.to_pickle().

根本没有区别，事实上我希望 Pandas 实现getstate以便调用pickle.dump(df)实际上与调用相同df.to_pickle()。

If you search for example __getstate__on the Pandas source code, you will find that it is implemented on several objects.

如果你__getstate__在 Pandas 源代码上搜索例如，你会发现它是在几个对象上实现的。

pandas 腌制熊猫数据帧的最快方法是什么？

提问by tegan

回答by tegan

回答by H4dr1en

相关推荐

最近更新

标签

pandas 腌制熊猫数据帧的最快方法是什么？

提问by tegan

回答by tegan

回答by H4dr1en

相关推荐

在 Pandas DataFrame 中拆分列表

Python Pandas 使用 pd.qcut 创建新的 Bin/Bucket 变量

pandas Python以退出代码-1073741819终止进程

将“pandas.get_dummies”转换应用到新数据的简单方法？

相关推荐

最近更新

标签