pandas read_table vs. read_csv vs. from_csv vs. read_excel的性能差异?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/31362573/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:36:52  来源:igfitidea点击:

Performance difference in pandas read_table vs. read_csv vs. from_csv vs. read_excel?

pythonperformancecsvpandasdataframe

提问by pylang

I tend to import .csv files into pandas, but sometimes I may get data in other formats to make DataFrameobjects.

我倾向于将 .csv 文件导入到 Pandas 中,但有时我可能会获取其他格式的数据来制作DataFrame对象。

Today, I just found out about read_tableas a "generic" importer for other formats, and wondered if there were significant performance differences between the various methods in pandas for reading .csv files, e.g. read_table, from_csv, read_excel.

今天,我刚刚发现关于read_table其他格式的“通用”导入器,并想知道 Pandas 中用于读取 .csv 文件的各种方法之间是否存在显着的性能差异,例如read_table, from_csv, read_excel.

  1. Do these other methods have better performance than read_csv?
  2. Is read_csvmuch different than from_csvfor creating a DataFrame?
  1. 这些其他方法的性能是否比read_csv?
  2. read_csv远远不同的from_csv创建DataFrame

回答by Daniel Boline

  1. read_tableis read_csvwith sep=','replaced by sep='\t', they are two thin wrappers around the same function so the performance will be identical. read_exceluses the xlrdpackage to read xls and xlsx files into a DataFrame, it doesn't handle csv files.
  2. from_csvcalls read_table, so no.
  1. read_tableread_csvsep=','被替换sep='\t',它们围绕着相同功能的两个薄包装纸因此性能将是相同的。 read_excel使用该xlrd包将 xls 和 xlsx 文件读入 DataFrame,它不处理 csv 文件。
  2. from_csv电话read_table,所以没有。

回答by griffinc

I've found that CSV and tab-delimited text (.txt) are equivalent in read and write speed, both are much faster than reading and writing MS Excel files. However, Excel format compresses the file size a lot.

我发现 CSV 和制表符分隔的文本 (.txt) 的读写速度相当,都比读写 MS Excel 文件快得多。但是,Excel 格式大大压缩了文件大小。



For the same 320 MB CSV file (16 MB .xlsx) (i7-7700k, SSD, running Anaconda Python 3.5.3, Pandas 0.19.2)

对于相同的 320 MB CSV 文件(16 MB .xlsx)(i7-7700k,SSD,运行 Anaconda Python 3.5.3,Pandas 0.19.2)

Using the standard convention import pandas as pd

使用标准约定 import pandas as pd

2 seconds to read .csv df = pd.read_csv('foo.csv')(same for pd.read_table)

2 秒读取 .csv df = pd.read_csv('foo.csv')(与 pd.read_table 相同)

15.3 seconds to read .xlsx df = pd.read_excel('foo.xlsx')

15.3 秒读取 .xlsx df = pd.read_excel('foo.xlsx')

10.5 seconds to write .csv df.to_csv('bar.csv', index=False)(same for .txt)

10.5 秒写入 .csv df.to_csv('bar.csv', index=False)(与 .txt 相同)

34.5 seconds to write .xlsx df.to_excel('bar.xlsx', sheet_name='Sheet1', index=False)

34.5 秒写入 .xlsx df.to_excel('bar.xlsx', sheet_name='Sheet1', index=False)



To write your dataframes to tab-delimited text files you can use:

要将数据框写入制表符分隔的文本文件,您可以使用:

df.to_csv('bar.txt', sep='\t', index=False)

df.to_csv('bar.txt', sep='\t', index=False)