csv 和 xlsx 文件导入到 Pandas 数据框:速度问题
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/16182822/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
csv & xlsx files import to pandas data frame: speed issue
提问by sashkello
Reading data (just 20000 numbers) from a xlsx file takes forever:
从 xlsx 文件读取数据(仅 20000 个数字)需要永远:
import pandas as pd
xlsxfile = pd.ExcelFile("myfile.xlsx")
data = xlsxfile.parse('Sheet1', index_col = None, header = None)
takes about 9 seconds.
大约需要 9 秒。
If I save the same file in csv format it takes ~25ms:
如果我以 csv 格式保存相同的文件,则需要大约 25 毫秒:
import pandas as pd
csvfile = "myfile.csv"
data = pd.read_csv(csvfile, index_col = None, header = None)
Is this an issue of openpyxl or am I missing something? Are there any alternatives?
这是 openpyxl 的问题还是我遗漏了什么?有没有其他选择?
回答by Matti John
xlrdhas support for .xlsx files, and this answersuggests that at least the beta version of xlrd with .xlsx support was quicker than openpyxl.
xlrd支持 .xlsx 文件,这个答案表明至少具有 .xlsx 支持的 xlrd 测试版比 openpyxl 更快。
The current stable version of Pandas (11.0) uses openpyxl for .xlsx files, but this has been changed for the next release. If you want to give it a go, you can download the dev version from GitHub
Pandas (11.0) 的当前稳定版本对 .xlsx 文件使用 openpyxl,但在下一个版本中已更改。如果你想试一试,你可以从GitHub下载开发版本

![pandas 熊猫绘制时间序列 ['numpy.ndarray' 对象没有属性 'find']](/res/img/loading.gif)