pandas read_table vs. read_csv vs. from_csv vs. read_excel的性能差异？

Question

提问by pylang

I tend to import .csv files into pandas, but sometimes I may get data in other formats to make DataFrameobjects.

我倾向于将 .csv 文件导入到 Pandas 中，但有时我可能会获取其他格式的数据来制作DataFrame对象。

Today, I just found out about read_tableas a "generic" importer for other formats, and wondered if there were significant performance differences between the various methods in pandas for reading .csv files, e.g. read_table, from_csv, read_excel.

今天，我刚刚发现关于read_table其他格式的“通用”导入器，并想知道 Pandas 中用于读取 .csv 文件的各种方法之间是否存在显着的性能差异，例如read_table, from_csv, read_excel.

Do these other methods have better performance than read_csv?
Is read_csvmuch different than from_csvfor creating a DataFrame?

这些其他方法的性能是否比read_csv?
是read_csv远远不同的from_csv创建DataFrame？

Answer 1

回答by Daniel Boline

read_tableis read_csvwith sep=','replaced by sep='\t', they are two thin wrappers around the same function so the performance will be identical. read_exceluses the xlrdpackage to read xls and xlsx files into a DataFrame, it doesn't handle csv files.
from_csvcalls read_table, so no.

read_table是read_csv与sep=','被替换sep='\t'，它们围绕着相同功能的两个薄包装纸因此性能将是相同的。 read_excel使用该xlrd包将 xls 和 xlsx 文件读入 DataFrame，它不处理 csv 文件。
from_csv电话read_table，所以没有。

Answer 2

回答by griffinc

I've found that CSV and tab-delimited text (.txt) are equivalent in read and write speed, both are much faster than reading and writing MS Excel files. However, Excel format compresses the file size a lot.

我发现 CSV 和制表符分隔的文本 (.txt) 的读写速度相当，都比读写 MS Excel 文件快得多。但是，Excel 格式大大压缩了文件大小。

For the same 320 MB CSV file (16 MB .xlsx) (i7-7700k, SSD, running Anaconda Python 3.5.3, Pandas 0.19.2)

对于相同的 320 MB CSV 文件（16 MB .xlsx）（i7-7700k，SSD，运行 Anaconda Python 3.5.3，Pandas 0.19.2）

Using the standard convention import pandas as pd

使用标准约定 import pandas as pd

2 seconds to read .csv df = pd.read_csv('foo.csv')(same for pd.read_table)

2 秒读取 .csv df = pd.read_csv('foo.csv')（与 pd.read_table 相同）

15.3 seconds to read .xlsx df = pd.read_excel('foo.xlsx')

15.3 秒读取 .xlsx df = pd.read_excel('foo.xlsx')

10.5 seconds to write .csv df.to_csv('bar.csv', index=False)(same for .txt)

10.5 秒写入 .csv df.to_csv('bar.csv', index=False)（与 .txt 相同）

34.5 seconds to write .xlsx df.to_excel('bar.xlsx', sheet_name='Sheet1', index=False)

34.5 秒写入 .xlsx df.to_excel('bar.xlsx', sheet_name='Sheet1', index=False)

To write your dataframes to tab-delimited text files you can use:

要将数据框写入制表符分隔的文本文件，您可以使用：

df.to_csv('bar.txt', sep='\t', index=False)

pandas read_table vs. read_csv vs. from_csv vs. read_excel的性能差异？

提问by pylang

回答by Daniel Boline

回答by griffinc

相关推荐

最近更新

标签

pandas read_table vs. read_csv vs. from_csv vs. read_excel的性能差异？

提问by pylang

回答by Daniel Boline

回答by griffinc

相关推荐

在 Python Pandas read_csv 中使用多字符分隔符

pandas 熊猫添加行而不是列

如何安装 Python Pandas？

pandas 散景悬停工具提示未显示所有数据 - Ipython notebook

相关推荐

最近更新

标签