将 Pandas 数据帧保存到 pickle 和 csv 之间有什么区别?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/48770542/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What is the difference between save a pandas dataframe to pickle and to csv?
提问by KevinKim
I am learning python pandas. I see a tutorial which shows two ways to save a pandas dataframe.
我正在学习 python Pandas。我看到一个教程,其中显示了两种保存Pandas数据框的方法。
pd.to_csv('sub.csv')
and to openpd.read_csv('sub.csv')
pd.to_pickle('sub.pkl')
and to openpd.read_pickle('sub.pkl')
pd.to_csv('sub.csv')
并打开pd.read_csv('sub.csv')
pd.to_pickle('sub.pkl')
并打开pd.read_pickle('sub.pkl')
The tutorial says to_pickle
is to save the dataframe to disk. I am confused about this. Because when I use to_csv
, I did see a csv file appears in the folder, which I assume is also save to disk right?
教程说to_pickle
是将数据帧保存到磁盘。我对此感到困惑。因为当我使用 时to_csv
,我确实看到文件夹中出现了一个 csv 文件,我认为它也保存到磁盘,对吗?
In general, why we want to save a dataframe using to_pickle
rather than save it to csv or txt or other format?
一般来说,为什么我们要使用to_pickle
而不是将其保存为 csv 或 txt 或其他格式来保存数据帧?
回答by Gabriel A
Pickle is a serialized way of storing a Pandas dataframe. You are basically writing down the exact representation of your dataframe to disc. This means the types of the columns are the same and the index is the same. If you simply save a file as a csv you are just storing it as a comma separated list. Depending on your data set, some information will be lost when you load it back up.
Pickle 是一种存储 Pandas 数据帧的序列化方式。您基本上是将数据帧的确切表示写到光盘上。这意味着列的类型相同,索引也相同。如果您只是将文件另存为 csv,那么您只是将其存储为逗号分隔的列表。根据您的数据集,当您重新加载它时,一些信息会丢失。