Openpyxl - 如何在 Python 中从 Excel 文件中仅读取一列?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/34754077/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 15:29:06  来源:igfitidea点击:

Openpyxl - How to read only one column from Excel file in Python?

pythonexcelopenpyxl

提问by lelarider

I want to pull only column A from my spreadsheet. I have the below code, but it pulls from all columns.

我只想从电子表格中提取 A 列。我有以下代码,但它从所有列中提取。

from openpyxl import Workbook, load_workbook

wb=load_workbook("/home/ilissa/Documents/AnacondaFiles/AZ_Palmetto_MUSC_searchterms.xlsx", use_iterators=True)
sheet_ranges=wb['PrivAlert Terms']

for row in sheet_ranges.iter_rows(row_offset=1): 
    for cell in row:
        print(cell.value)

回答by Thtu

I would suggest using the pandas library.

我建议使用熊猫库。

import pandas as pd
dataFrame = pd.read_excel("/home/ilissa/Documents/AnacondaFiles/AZ_Palmetto_MUSC_searchterms.xlsx", sheetname = "PrivAlert Terms", parse_cols = 0)

If you don't feel comfortable in pandas, or for whatever reason need to work with openpyxl, the error in your code is that you aren't selecting only the first column. You explicitly call for each cell in each row. If you only want the first column, then only get the first column in each row.

如果您对 Pandas 感到不舒服,或者出于任何原因需要使用 openpyxl,那么您的代码中的错误是您没有只选择第一列。您显式调用每一行中的每个单元格。如果您只想要第一列,则只获取每行中的第一列。

for row in sheet_ranges.iter_rows(row_offset=1): 
    print(row[0].value)

回答by Charlie Clark

Use ws.get_squared_range()to control precisely the range of cells, such as a single column, that is returned.

用于ws.get_squared_range()精确控制返回的单元格范围,例如单列。

回答by Compadre

Here is a simple function:

这是一个简单的函数:

import openpyxl

def return_column_from_excel(file_name, sheet_name, column_num, first_data_row=1):
    wb = openpyxl.load_workbook(filename=file_name)
    ws = wb.get_sheet_by_name(sheet_name)
    min_col, min_row, max_col, max_row = (column_num, first_data_row, column_num, ws.max_row)
    return ws.get_squared_range(min_col, min_row, max_col, max_row)

回答by ZLNK

this is an alternative to previous answers in case you whish read one or more columns using openpyxl

如果您希望使用 openpyxl 阅读一列或多列,这是先前答案的替代方法

import openpyxl

wb = openpyxl.load_workbook('origin.xlsx')
first_sheet = wb.get_sheet_names()[0]
worksheet = wb.get_sheet_by_name(first_sheet)

#here you iterate over the rows in the specific column
for row in range(2,worksheet.max_row+1):  
    for column in "ADEF":  #Here you can add or reduce the columns
        cell_name = "{}{}".format(column, row)
        worksheet[cell_name].value # the value of the specific cell
        ... your tasks... 

I hope that this be useful.

我希望这很有用。

回答by ewilan

Using ZLNK's excellent response, I created this function that uses list comprehension to achieve the same result in a single line:

使用 ZLNK 的出色响应,我创建了这个函数,该函数使用列表理解在一行中实现相同的结果:

def read_column(ws, begin, columns):
  return [ws["{}{}".format(column, row)].value for row in range(begin, len(ws.rows) + 1) for column in columns]

You can then call it by passing a worksheet, a row to begin on and the first letter of any column you want to return:

然后,您可以通过传递工作表、要开始的行和要返回的任何列的第一个字母来调用它:

column_a_values = read_column(worksheet, 2, 'A')

To return column A and column B, the call changes to this:

要返回 A 列和 B 列,调用更改为:

column_ab_values = read_column(worksheet, 2, 'AB')

回答by Harilal Remesan

Using openpyxl

使用 openpyxl

from openpyxl import load_workbook
# The source xlsx file is named as source.xlsx
wb=load_workbook("source.xlsx")

ws = wb.active
first_column = ws['A']

# Print the contents
for x in xrange(len(first_column)): 
    print(first_column[x].value) 

回答by Serhii Aksiutin

By using openpyxl library and Python's list comprehensions concept:

通过使用 openpyxl 库和 Python 的列表推导概念:

import openpyxl

book = openpyxl.load_workbook('testfile.xlsx')
user_data = book.get_sheet_by_name(str(sheet_name))
print([str(user_data[x][0].value) for x in range(1,user_data.max_row)])

It is pretty amazing approach and worth a try

这是非常了不起的方法,值得一试

回答by Lorenzo

In my opinion is much simpler

在我看来要简单得多

from openpyxl import Workbook, load_workbook
wb = load_workbook("your excel file")
source = wb["name of the sheet"]
for cell in source['A']:
    print(cell.value)