如何使用 Python 对 Excel 工作表进行排序
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/27469182/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to sort Excel sheet using Python
提问by Ree
I am using Python 3.4 and xlrd. I want to sort the Excel sheet based on the primary column before processing it. Is there any library to perform this ?
我正在使用 Python 3.4 和 xlrd。我想在处理之前根据主列对 Excel 工作表进行排序。是否有任何图书馆可以执行此操作?
回答by Andy
There are a couple ways to do this. The first option is to utilize xlrd
, as you have this tagged. The biggest downside to this is that it doesn't natively write to XLSX format.
有几种方法可以做到这一点。第一个选项是利用xlrd
,因为您已经标记了它。最大的缺点是它本身不能写入 XLSX 格式。
These examples use an excel document with this format:
这些示例使用具有以下格式的 Excel 文档:
Utilizing xlrd
and a few modifications from this answer:
利用xlrd
此答案并对其进行一些修改:
import xlwt
from xlrd import open_workbook
target_column = 0 # This example only has 1 column, and it is 0 indexed
book = open_workbook('test.xlsx')
sheet = book.sheets()[0]
data = [sheet.row_values(i) for i in xrange(sheet.nrows)]
labels = data[0] # Don't sort our headers
data = data[1:] # Data begins on the second row
data.sort(key=lambda x: x[target_column])
bk = xlwt.Workbook()
sheet = bk.add_sheet(sheet.name)
for idx, label in enumerate(labels):
sheet.write(0, idx, label)
for idx_r, row in enumerate(data):
for idx_c, value in enumerate(row):
sheet.write(idx_r+1, idx_c, value)
bk.save('result.xls') # Notice this is xls, not xlsx like the original file is
This outputs the following workbook:
这将输出以下工作簿:
Another option (and one that can utilize XLSX output) is to utilize pandas
. The code is also shorter:
另一种选择(并且可以利用 XLSX 输出)是利用pandas
. 代码也更短:
import pandas as pd
xl = pd.ExcelFile("test.xlsx")
df = xl.parse("Sheet1")
df = df.sort(columns="Header Row")
writer = pd.ExcelWriter('output.xlsx')
df.to_excel(writer,sheet_name='Sheet1',columns=["Header Row"],index=False)
writer.save()
This outputs:
这输出:
In the to_excel
call, the index
is set to False
, so that the Pandas dataframe index isn't included in the excel document. The rest of the keywords should be self explanatory.
在to_excel
调用中,index
设置为False
,以便 Pandas 数据框索引不包含在 excel 文档中。其余的关键字应该是不言自明的。