Python 如何在openpyxl中使用字段名称或列标题?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/34296132/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to use field name or column header in openpyxl?
提问by stahna
See my code below. This code works very well, but I would like to do two things. One thing is I made if statement with or much shorter than actual for example. I have many columns like this, not all next to each other. I would like it to be shorter. Also, sometimes I may not know exact column letter.
请参阅下面的我的代码。这段代码工作得很好,但我想做两件事。一件事是我做了 if 语句或比实际短得多,例如。我有很多这样的专栏,并不是所有的都相邻。我希望它更短。另外,有时我可能不知道确切的列字母。
So I want to know if there is a way to know the column name or header. Like the values that would be in very top row. So I can test to see if it is one of those values to always perform function on that cell if it's in the specified column. I can't find openpyxl function to do column name. Not sure if it understands that first row is different than rest. I think maybe if not I can try to do test on first row, but don't understand how to make this.
所以我想知道是否有办法知道列名或标题。就像位于最顶行的值一样。因此,我可以测试它是否是始终在该单元格上执行功能的那些值之一(如果它在指定列中)。我找不到 openpyxl 函数来做列名。不确定它是否理解第一行与其他行不同。我想也许如果不是我可以尝试在第一行做测试,但不明白如何做到这一点。
So is there a way to call column name? or if there is no way to call column name to test, can someone help me with doing check on first row to see if it has value? then do change on correct row I'm in? Does this make sense.
那么有没有办法调用列名?或者如果无法调用列名进行测试,有人可以帮我检查第一行以查看它是否有价值吗?然后在我所在的正确行上进行更改?这有意义吗。
So instead of code saying:
所以,而不是代码说:
if cellObj.column == 'H' or ...
It would say:
它会说:
if cellObj.column_header == 'NameOfField or ...
Or if not possible to do that, then:
或者,如果不可能做到这一点,那么:
if this cell has column where first row value is 'NameOfField' ...
Please help with best way to do this. I have looked on stackoverflow and in book and blog site, but does not seem to be a way to call column name (not the letter of column).
请帮助最好的方法来做到这一点。我查看过stackoverflow以及书籍和博客站点,但似乎不是一种调用列名(不是列的字母)的方法。
for row in sheet.iter_rows():
for cellObj in row:
if cellObj.column == 'H' or cellObj.column == 'I' or cellObj.column == 'L' or cellObj.column == 'M':
print(cellObj.value),
if cellObj.value.upper() == 'OldValue1':
cellObj.value = 1
print(cellObj.value)
elif cellObj.value.upper() == 'OldValue2':
cellObj.value = 2
print(cellObj.value)
回答by Mike Müller
EDIT
编辑
Assuming these are the header names you are looking for:
假设这些是您要查找的标题名称:
colnames = ['Header1', 'Header2', 'Header3']
Find the indices for these columns:
查找这些列的索引:
col_indices = {n for n, cell in enumerate(sheet.rows[0]) if cell.value in colnames}
Now iterate over the remain rows:
现在迭代剩余的行:
for row in sheet.rows[1:]:
for index, cell in enumerate(row):
if index in col_indices:
if cell.value.upper() == 'OldValue1':
cell.value = 1
print(cell.value)
elif cell.value.upper() == 'OldValue2':
cell.value = 2
print(cell.value)
Use a dictionary instead of a set to keep the column names around:
使用字典而不是集合来保留列名:
col_indices = {n: cell.value for n, cell in enumerate(sheet.rows[0])
if cell.value in colnames}
for row in sheet.rows[1:]:
for index, cell in enumerate(row):
if index in col_indices:
print('col: {}, row: {}, content: {}'.format(
col_indices[index], index, cell.value))
if cell.value.upper() == 'OldValue1':
cell.value = 1
elif cell.value.upper() == 'OldValue2':
cell.value = 2
Old answer
旧答案
This makes your if
statement shorter:
这使您的if
陈述更短:
if cellObj.column in 'HILM':
print(cellObj.value),
For multi letter column coordinates you need to use a list:
对于多字母列坐标,您需要使用列表:
if cellObj.column in ['H', 'AA', 'AB', 'AD']:
print(cellObj.value),
回答by tornesi
You can access cells from the first row and and column using the sheet.cell(row=#, column = #) syntax. For example:
您可以使用 sheet.cell(row=#, column = #) 语法从第一行和第一列访问单元格。例如:
for row in enumerate(sheet.iter_rows()):
for j, cellObj in enumerate(row):
header_cell = sheet.cell(row=1, column=j)
if cellObj.column in ['H', 'I', 'L', 'M', 'AA', 'AB']:
print(cellObj.value),
if cellObj.value.upper() == 'OldValue1':
cellObj.value = 1
print(cellObj.value)
elif cellObj.value.upper() == 'OldValue2':
cellObj.value = 2
print(cellObj.value)
回答by Nuno André
Since row
returns a generator, you can easily extract headers in the first iteration, treat them as you need, and then continue to consume it. For instance:
由于row
返回一个生成器,您可以轻松地在第一次迭代中提取标题,根据需要处理它们,然后继续使用它。例如:
headers = [cell.value for cell in next(sheet.rows)]
# find indexes of targeted columns
cols = [headers.index(header) for header in 'HILM']
conv = {'OldValue1': 1, 'OldValue2': 2}
for row in sheet.rows:
values = [cell.value for cell in row]
for col in cols:
values[col] = conv[values[col]]
回答by Edvrsoft
You have many ways to do this. some approach that i used:
你有很多方法可以做到这一点。我使用的一些方法:
1. Brute force
1. 蛮力
Assuming "sheet" and "workbook" are defined.
假设定义了“工作表”和“工作簿”。
header = [cell for cell in sheet['A1:XFD1'][0] if cell.value is not None and cell.value.strip() != ''] #you get all non-null columns
target_values = ['NameOfField', 'NameOfField1', 'NameOfField2'] #filter list
target_header = [cell.column for cell in header if cell.value in target_values] #get column index
data = {'OldValue1': 1, 'OldValue2': 2}
for row in sheet.iter_rows(max_row=sheet.max_row, max_col=sheet.max_column):
for cell in row:
if cell.column in target_header and cell.value in data :
cell.value = data[cell.value]
In this case, the brute force is in "sheet['A1:XFD1']". we have to check for all columns the first time. But you'll get all cells references for columns. After that, we create target_values(our columns names...) and we create a list with column index (target_header). Finally we iterated over sheet. We check if the cell's column is in the column index and check if the cell's value is in data, so we're able to change the value.
在这种情况下,蛮力在“sheet['A1:XFD1']”中。我们必须第一次检查所有列。但是您将获得列的所有单元格引用。之后,我们创建target_values(我们的列名...)并创建一个带有列索引(target_header)的列表。最后我们迭代了工作表。我们检查单元格的列是否在列索引中并检查单元格的值是否在数据中,因此我们能够更改该值。
Downside:if exists cell with random whitespace outside "data area". max_row and max_column will considerthat cells (iterate over blank cells).
缺点:如果在“数据区域”之外存在带有随机空白的单元格。max_row 和 max_column将考虑该单元格(迭代空白单元格)。
2. Check for bundaries
2. 检查边界
You can use your own max row and max column if the data has table form(no empty space between columns, a column with "id"-> not null, not whitespace).
如果数据具有表格形式(列之间没有空格,带有“id”-> 非空,非空白的列),您可以使用自己的最大行和最大列。
from openpyxl.utils import get_column_letter
def find_limit_sheet(direction):
max_limit_value = 1
while (direction(max_limit_value).value is not None) and (direction(max_limit_value).value.strip() != ''):
max_limit_value = max_limit_value + 1
return (max_limit_value - 1) if max_limit_value != 1 else 1
max_qrow = find_limit_sheet(direction=lambda increment: sheet.cell(row=increment, column=1))
max_qcolumn = find_limit_sheet(direction=lambda increment: sheet.cell(column=increment, row=1))
header = [cell for cell in sheet[f'A1:{get_column_letter(max_qcolumn)}1']] #you get all non-null columns
target_values = ['NameOfField', 'NameOfField1', 'NameOfField2'] #filter list
target_header = [cell.column for cell in header[0] if cell.value in target_values] #get column names
data = {'OldValue1': 1, 'OldValue2': 2}
for row in sheet.iter_rows(max_row=max_qrow, max_col=max_qcolumn):
for cell in row:
if cell.column in target_header and cell.value in data :
cell.value = data[cell.value]
In this case we are inside "data area" only.
在这种情况下,我们仅在“数据区”内。
3. Optional: Using Pandas
3. 可选:使用 Pandas
If you need more complex operation on excel data(i have to read a lots of excel in my work :( as data source). I prefer convert to pandas dataframe-> make operation -> save result .
如果您需要对 excel 数据进行更复杂的操作(我必须在我的工作中阅读大量 excel :( 作为数据源)。我更喜欢转换为 Pandas 数据框-> 制作操作-> 保存结果。
In this case we use all the data.
在这种情况下,我们使用所有数据。
from openpyxl.utils import get_column_letter
import pandas as pd
def find_limit_sheet(direction):
max_limit_value = 1
while (direction(max_limit_value).value is not None) and (direction(max_limit_value).value.strip() != ''):
max_limit_value = max_limit_value + 1
return (max_limit_value - 1) if max_limit_value != 1 else 1
max_qrow = find_limit_sheet(direction=lambda increment: sheet.cell(row=increment, column=1))
max_qcolumn = find_limit_sheet(direction=lambda increment: sheet.cell(column=increment, row=1))
header = [cell.value for cell in sheet[f'A1:{get_column_letter(max_qcolumn)}1'][0]] #you get all non-null columns
raw_data = []
for row in sheet.iter_rows(max_row=max_qrow, max_col=max_qcolumn):
row_data = [cell.value for cell in row]
raw_data.append(dict(zip(header, row_data)))
df = pandas.DataFrame(raw_data)
df.columns = df.iloc[0]
df = df[1:]
You can also use a sub-set of columns using target_datafor example 2.
您还可以使用target_data使用列的子集,例如 2。
...
target_header = [cell.column for cell in header[0] if cell.value in target_values] #get column names
...
raw_data = []
for row in sheet.iter_rows(max_row=max_qrow, max_col=max_qcolumn):
row_data = [cell.value for cell in row if cell.column in target_header]
raw_data.append(dict(zip(header, row_data)))
df = pd.DataFrame(raw_data)
df.columns = df.iloc[0]
df = df[1:]
...
INFO
信息
- openpyxl: 2.6.2
- pandas: 0.24.2
- python: 3.7.3
- Data Structures: List Comprehensions doc
- lambda expr: lambda expression
- openpyxl:2.6.2
- 熊猫:0.24.2
- 蟒蛇:3.7.3
- 数据结构:列表推导式文档
- lambda expr: lambda 表达式