Python:在多张工作表上将 Pandas DataFrame 写入 Excel 的最快方法
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/25863381/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python: fastest way to write pandas DataFrame to Excel on multiple sheets
提问by Pythonista anonymous
I need to export 24 pandas data frames ( 140 columns x 400 rows)to Excel, each into a different sheet.
我需要将 24 个 Pandas 数据框(140 列 x 400 行)导出到Excel,每个都导出到不同的工作表中。
I am using pandas'built-in ExcelWriter. Running 24 scenarios, it takes:
我正在使用Pandas的内置ExcelWriter. 运行24个场景,需要:
51 seconds to write to an .xlsfile (using xlwt)
51 秒写入.xls文件(使用xlwt)
86 seconds to write to an .xlsxfile (using XlsxWriter)
写入.xlsx文件需要86 秒(使用XlsxWriter)
141 seconds to write to an .xlsmfile (using openpyxl)
141 秒写入.xlsm文件(使用openpyxl)
21 seconds to just run the program (no Excel output)
只需 21 秒即可运行程序(无 Excel 输出)
The problem with writing to .xlsis that the spreadsheet contains no formatting styles, so if I open it in Excel, select a column, and click on the ‘comma' button to format the numbers, it tells me: ‘style comma not found'. I don't get this problem writing to an .xlsx, but that's even slower.
写入的问题.xls是电子表格不包含格式样式,因此如果我在 Excel 中打开它,选择一列,然后单击“逗号”按钮以设置数字格式,它会告诉我:“找不到样式逗号”。我没有遇到这个问题,写到.xlsx,但速度更慢。
Any suggestions on how to make the exporting faster? I can't be the first one to have this problem, yet after hours of searching forums and websites I haven't found any definite solution.
关于如何使导出更快的任何建议?我不可能是第一个遇到此问题的人,但经过数小时的论坛和网站搜索后,我还没有找到任何明确的解决方案。
The only thing I can think of is to use Pythonto export to csv files, and then write an Excel macro to merge all the CSVs into a single spreadsheet.
我唯一能想到的就是用Python导出到csv文件,然后写一个Excel宏把所有的CSV合并到一个电子表格中。
The .xlsfile is 10 MB, and the .xlsx5.2 MB
该.xls文件为 10 MB,而.xlsx5.2 MB
Thanks!
谢谢!
回答by jmcnamara
Here is a benchmark for different Python to Excel modules.
And here is the output for 140 columns x (400 x 24) rows using the latest version of the modules at the time of posting:
这是使用发布时最新版本模块的 140 列 x (400 x 24) 行的输出:
Versions:
python : 2.7.7
openpyxl : 2.0.5
pyexcelerate: 0.6.3
xlsxwriter : 0.5.7
xlwt : 0.7.5
Dimensions:
Rows = 9600 (400 x 24)
Cols = 140
Times:
pyexcelerate : 11.85
xlwt : 17.64
xlsxwriter (optimised): 21.63
xlsxwriter : 26.76
openpyxl (optimised): 95.18
openpyxl : 119.29
As with any benchmark the results will depend on Python/module versions, CPU, RAM and Disk I/O and on the benchmark itself. So make sure to verify these results for your own setup.
与任何基准测试一样,结果将取决于 Python/模块版本、CPU、RAM 和磁盘 I/O 以及基准测试本身。因此,请确保针对您自己的设置验证这些结果。
Also, since you asked specifically about Pandas, please note that PyExcelerate isn't supported.
另外,由于您专门询问了 Pandas,请注意 PyExcelerate is not supported。
回答by JohnE
For what it's worth, this is how I format the output in xlwt. The documentation is (or at least was) pretty spotty so I had to guess most of this!
值得一提的是,这就是我在 xlwt 中格式化输出的方式。文档(或至少是)相当参差不齐,所以我不得不猜测大部分内容!
import xlwt
style = xlwt.XFStyle()
style.font.name = 'Courier'
style.font.height = 180
style.num_format_str = '#,##0'
# ws0 is a worksheet
ws0.write( row, col, value, style )
Also, I believe I duplicated your error message when attempting to format the resulting spreadsheet in excel (office 2010 version). It's weird, but some of the drop down tool bar format options work and some don't. But it looks like they all work fine if I go to "format cells" via a right click.
另外,我相信我在尝试在 excel(office 2010 版本)中格式化生成的电子表格时复制了您的错误消息。这很奇怪,但有些下拉工具栏格式选项有效,有些则无效。但是,如果我通过右键单击转到“格式化单元格”,它们看起来都可以正常工作。

