Python 使用不同列的不同格式将 Pandas DataFrame 写入 Excel
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/29974672/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Writing pandas DataFrame to Excel with different formats for different columns
提问by sparc_spread
I am trying to write a pandas DataFrame
to an .xlsx
file where different numerical columns would have different formats. For example, some would show only two decimal places, some would show none, some would be formatted as percents with a "%" symbol, etc.
我试图写一个大熊猫DataFrame
到一个.xlsx
文件,其中不同的数值列将有不同的格式。例如,有些将只显示两位小数,有些将不显示,有些将格式化为带有“%”符号的百分比等。
I noticed that DataFrame.to_html()
has a formatters
parameter that allows one to do just that, mapping different formats to different columns. However, there is no similar parameter on the DataFrame.to_excel()
method. The most we have is a float_format
that is global to all numbers.
我注意到它DataFrame.to_html()
有一个formatters
参数可以让人们做到这一点,将不同的格式映射到不同的列。但是,该DataFrame.to_excel()
方法没有类似的参数。我们拥有的最多的是一个float_format
对所有数字都是全局的。
I have read many SO posts that are at least partly related to my question, for example:
我已经阅读了许多至少与我的问题部分相关的 SO 帖子,例如:
- Use the older
openpyxl
engine to apply formats one cell at a time. This is the approach with which I've had the most success. But it means writing loops to apply formats cell-by-cell, remembering offsets, etc. - Render percentages by changing the table data itself into strings. Going the route of altering the actual data inspired me to try dealing with decimal place formatting by calling
round()
on each column before writing to Excel - this works too, but I'd like to avoid altering the data. - Assorted others, mostly about date formats
- 使用旧
openpyxl
引擎一次应用格式一个单元格。这是我最成功的方法。但这意味着编写循环以逐个单元地应用格式,记住偏移量等。 - 通过将表数据本身更改为字符串来呈现百分比。改变实际数据的路线启发我尝试通过
round()
在写入 Excel 之前调用每一列来处理小数位格式- 这也有效,但我想避免更改数据。 - 各种其他的,主要是关于日期格式
Are there other more convenient Excel-related functions/properties in the pandas API that can help here, or something similar on openpyxl
, or perhaps some way to specify output format metadata directly onto each column in the DataFrame
that would then be interpreted downstream by different outputters?
Pandas API 中是否还有其他更方便的 Excel 相关函数/属性可以在此处提供帮助,或者类似的东西openpyxl
,或者某种方式将输出格式元数据直接指定到 中的每一列,DataFrame
然后由不同的输出器下游解释?
采纳答案by jmcnamara
You can do this with Pandas 0.16 and the XlsxWriter engine by accessing the underlying workbook and worksheet objects:
您可以使用 Pandas 0.16 和 XlsxWriter 引擎通过访问底层工作簿和工作表对象来做到这一点:
import pandas as pd
# Create a Pandas dataframe from some data.
df = pd.DataFrame(zip(
[1010, 2020, 3030, 2020, 1515, 3030, 4545],
[.1, .2, .33, .25, .5, .75, .45],
[.1, .2, .33, .25, .5, .75, .45],
))
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1')
# Get the xlsxwriter objects from the dataframe writer object.
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Add some cell formats.
format1 = workbook.add_format({'num_format': '#,##0.00'})
format2 = workbook.add_format({'num_format': '0%'})
format3 = workbook.add_format({'num_format': 'h:mm:ss AM/PM'})
# Set the column width and format.
worksheet.set_column('B:B', 18, format1)
# Set the format but not the column width.
worksheet.set_column('C:C', None, format2)
worksheet.set_column('D:D', 16, format3)
# Close the Pandas Excel writer and output the Excel file.
writer.save()
Output:
输出:
回答by Charlie Clark
As you rightly point out applying formats to individual cells is extremely inefficient.
正如您正确指出的那样,将格式应用于单个单元格效率极低。
openpyxl 2.4 includes native support for Pandas Dataframes and named styles.
openpyxl 2.4 包括对 Pandas Dataframes 和命名样式的原生支持。