Python Pandas 数据框读取 Excel 工作表中的精确指定范围

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/38560748/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 21:07:59  来源:igfitidea点击:

Python Pandas dataframe reading exact specified range in an excel sheet

pythonexcelpandas

提问by spiff

I have a lot of different table (and other unstructured data in an excel sheet) .. I need to create a dataframe out of range 'A3:D20' from 'Sheet2' of Excel sheet 'data'.

我有很多不同的表(以及 Excel 工作表中的其他非结构化数据).. 我需要从 Excel 工作表“数据”的“Sheet2”中创建一个超出“A3:D20”范围的数据框。

All examples that I come across drilldown up to sheet level, but not how to pick it from an exact range.

我遇到的所有示例都向下钻取到工作表级别,但不是如何从精确范围中选择它。

import openpyxl
import pandas as pd

wb = openpyxl.load_workbook('data.xlsx')
sheet = wb.get_sheet_by_name('Sheet2')
range = ['A3':'D20']   #<-- how to specify this?
spots = pd.DataFrame(sheet.range) #what should be the exact syntax for this?

print (spots)

Once I get this, I plan to look up data in column A and find its corresponding value in column B.

一旦我得到这个,我计划在 A 列中查找数据并在 B 列中找到其对应的值。

Edit 1: I realised that openpyxl takes too long, and so have changed that to pandas.read_excel('data.xlsx','Sheet2')instead, and it is much faster at that stage at least.

编辑 1:我意识到 openpyxl 花费的时间太长,因此已将其更改为pandas.read_excel('data.xlsx','Sheet2'),并且至少在那个阶段要快得多。

Edit 2: For the time being, I have put my data in just one sheet and:

编辑 2:目前,我只将我的数据放在一张纸上,并且:

  • removed all other info
  • added column names,
  • applied index_colon my leftmost column
  • then used wb.loc[]
  • 删除所有其他信息
  • 添加了列名,
  • 应用于index_col我最左边的列
  • 然后使用 wb.loc[]

采纳答案by ???S???

One way to do this is to use the openpyxlmodule.

一种方法是使用openpyxl模块。

Here's an example:

下面是一个例子:

from openpyxl import load_workbook

wb = load_workbook(filename='data.xlsx', 
                   read_only=True)

ws = wb['Sheet2']

# Read the cell values into a list of lists
data_rows = []
for row in ws['A3':'D20']:
    data_cols = []
    for cell in row:
        data_cols.append(cell.value)
    data_rows.append(data_cols)

# Transform into dataframe
import pandas as pd
df = pd.DataFrame(data_rows)

回答by shane

Use the following arguments from pandas read_excel documentation:

使用pandas read_excel 文档中的以下参数:

  • skiprows : list-like
    • Rows to skip at the beginning (0-indexed)
  • parse_cols : int or list, default None
    • If None then parse all columns,
    • If int then indicates last column to be parsed
    • If list of ints then indicates list of column numbers to be parsed
    • If string then indicates comma separated list of column names and column ranges (e.g. “A:E” or “A,C,E:F”)
  • skiprows :类似列表
    • 在开头跳过的行(0 索引)
  • parse_cols : 整数或列表,默认无
    • 如果 None 则解析所有列,
    • 如果 int 则表示要解析的最后一列
    • 如果整数列表则指示要解析的列号列表
    • 如果字符串则表示列名和列范围的逗号分隔列表(例如“A:E”或“A,C,E:F”)

I imagine the call will look like:

我想电话会是这样的:

df = read_excel(filename, 'Sheet2', skiprows = 2, parse_cols = 'A:D')

回答by ddnsimplon

my answer with pandas O.25 tested and worked well

我对 Pandas O.25 的回答经过测试并且运行良好

pd.read_excel('resultat-elections-2012.xls', sheet_name = 'France entière T1T2', skiprows = 2,  nrows= 5, usecols = 'A:H')
pd.read_excel('resultat-elections-2012.xls', index_col = None, skiprows= 2, nrows= 5, sheet_name='France entière T1T2', usecols=range(0,8))

So : i need data after two first lines ; selected desired lines (5) and col A to H.
Be carefull @shane answer's need to be improved and updated with the new parameters of Pandas

所以:我需要两行后的数据;选择所需的行 (5) 和 col A 到 H。
小心@shane 答案需要改进和更新 Pandas 的新参数

my original excel

my original excel

my process with pandas read_excel

my process with pandas read_excel