如何使用 Python 对 Excel 工作表进行排序

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/27469182/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 01:49:57  来源:igfitidea点击:

How to sort Excel sheet using Python

pythonxlrd

提问by Ree

I am using Python 3.4 and xlrd. I want to sort the Excel sheet based on the primary column before processing it. Is there any library to perform this ?

我正在使用 Python 3.4 和 xlrd。我想在处理之前根据主列对 Excel 工作表进行排序。是否有任何图书馆可以执行此操作?

回答by Andy

There are a couple ways to do this. The first option is to utilize xlrd, as you have this tagged. The biggest downside to this is that it doesn't natively write to XLSX format.

有几种方法可以做到这一点。第一个选项是利用xlrd,因为您已经标记了它。最大的缺点是它本身不能写入 XLSX 格式。

These examples use an excel document with this format:

这些示例使用具有以下格式的 Excel 文档:

Text Excel Layout

文本 Excel 布局

Utilizing xlrdand a few modifications from this answer:

利用xlrd答案并对其进行一些修改:

import xlwt
from xlrd import open_workbook

target_column = 0     # This example only has 1 column, and it is 0 indexed

book = open_workbook('test.xlsx')
sheet = book.sheets()[0]
data = [sheet.row_values(i) for i in xrange(sheet.nrows)]
labels = data[0]    # Don't sort our headers
data = data[1:]     # Data begins on the second row
data.sort(key=lambda x: x[target_column])

bk = xlwt.Workbook()
sheet = bk.add_sheet(sheet.name)

for idx, label in enumerate(labels):
     sheet.write(0, idx, label)

for idx_r, row in enumerate(data):
    for idx_c, value in enumerate(row):
        sheet.write(idx_r+1, idx_c, value)

bk.save('result.xls')    # Notice this is xls, not xlsx like the original file is

This outputs the following workbook:

这将输出以下工作簿:

XLRD output

XLRD 输出



Another option (and one that can utilize XLSX output) is to utilize pandas. The code is also shorter:

另一种选择(并且可以利用 XLSX 输出)是利用pandas. 代码也更短:

import pandas as pd

xl = pd.ExcelFile("test.xlsx")
df = xl.parse("Sheet1")
df = df.sort(columns="Header Row")

writer = pd.ExcelWriter('output.xlsx')
df.to_excel(writer,sheet_name='Sheet1',columns=["Header Row"],index=False)
writer.save()

This outputs:

这输出:

Pandas Output

熊猫输出

In the to_excelcall, the indexis set to False, so that the Pandas dataframe index isn't included in the excel document. The rest of the keywords should be self explanatory.

to_excel调用中,index设置为False,以便 Pandas 数据框索引不包含在 excel 文档中。其余的关键字应该是不言自明的。