将 Pandas 数据帧保存到 pickle 和 csv 之间有什么区别?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/48770542/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:10:21  来源:igfitidea点击:

What is the difference between save a pandas dataframe to pickle and to csv?

pythonpandascsvpickle

提问by KevinKim

I am learning python pandas. I see a tutorial which shows two ways to save a pandas dataframe.

我正在学习 python Pandas。我看到一个教程,其中显示了两种保存Pandas数据框的方法。

  1. pd.to_csv('sub.csv')and to open pd.read_csv('sub.csv')

  2. pd.to_pickle('sub.pkl')and to open pd.read_pickle('sub.pkl')

  1. pd.to_csv('sub.csv')并打开 pd.read_csv('sub.csv')

  2. pd.to_pickle('sub.pkl')并打开 pd.read_pickle('sub.pkl')

The tutorial says to_pickleis to save the dataframe to disk. I am confused about this. Because when I use to_csv, I did see a csv file appears in the folder, which I assume is also save to disk right?

教程说to_pickle是将数据帧保存到磁盘。我对此感到困惑。因为当我使用 时to_csv,我确实看到文件夹中出现了一个 csv 文件,我认为它也保存到磁盘,对吗?

In general, why we want to save a dataframe using to_picklerather than save it to csv or txt or other format?

一般来说,为什么我们要使用to_pickle而不是将其保存为 csv 或 txt 或其他格式来保存数据帧?

回答by Gabriel A

Pickle is a serialized way of storing a Pandas dataframe. You are basically writing down the exact representation of your dataframe to disc. This means the types of the columns are the same and the index is the same. If you simply save a file as a csv you are just storing it as a comma separated list. Depending on your data set, some information will be lost when you load it back up.

Pickle 是一种存储 Pandas 数据帧的序列化方式。您基本上是将数据帧的确切表示写到光盘上。这意味着列的类型相同,索引也相同。如果您只是将文件另存为 csv,那么您只是将其存储为逗号分隔的列表。根据您的数据集,当您重新加载它时,一些信息会丢失。

https://docs.python.org/3/library/pickle.html

https://docs.python.org/3/library/pickle.html