Python Pandas 数据框读取 Excel 工作表中的精确指定范围

Question

提问by spiff

I have a lot of different table (and other unstructured data in an excel sheet) .. I need to create a dataframe out of range 'A3:D20' from 'Sheet2' of Excel sheet 'data'.

我有很多不同的表（以及 Excel 工作表中的其他非结构化数据）.. 我需要从 Excel 工作表“数据”的“Sheet2”中创建一个超出“A3:D20”范围的数据框。

All examples that I come across drilldown up to sheet level, but not how to pick it from an exact range.

我遇到的所有示例都向下钻取到工作表级别，但不是如何从精确范围中选择它。

import openpyxl
import pandas as pd

wb = openpyxl.load_workbook('data.xlsx')
sheet = wb.get_sheet_by_name('Sheet2')
range = ['A3':'D20']   #<-- how to specify this?
spots = pd.DataFrame(sheet.range) #what should be the exact syntax for this?

print (spots)

Once I get this, I plan to look up data in column A and find its corresponding value in column B.

一旦我得到这个，我计划在 A 列中查找数据并在 B 列中找到其对应的值。

Edit 1: I realised that openpyxl takes too long, and so have changed that to pandas.read_excel('data.xlsx','Sheet2')instead, and it is much faster at that stage at least.

编辑 1：我意识到 openpyxl 花费的时间太长，因此已将其更改为pandas.read_excel('data.xlsx','Sheet2')，并且至少在那个阶段要快得多。

Edit 2: For the time being, I have put my data in just one sheet and:

编辑 2：目前，我只将我的数据放在一张纸上，并且：

removed all other info
added column names,
applied index_colon my leftmost column
then used wb.loc[]

删除所有其他信息
添加了列名，
应用于index_col我最左边的列
然后使用 wb.loc[]

Answer 1

采纳答案by ???S???

One way to do this is to use the openpyxlmodule.

一种方法是使用openpyxl模块。

Here's an example:

下面是一个例子：

from openpyxl import load_workbook

wb = load_workbook(filename='data.xlsx', 
                   read_only=True)

ws = wb['Sheet2']

# Read the cell values into a list of lists
data_rows = []
for row in ws['A3':'D20']:
    data_cols = []
    for cell in row:
        data_cols.append(cell.value)
    data_rows.append(data_cols)

# Transform into dataframe
import pandas as pd
df = pd.DataFrame(data_rows)

Answer 2

回答by shane

Use the following arguments from pandas read_excel documentation:

使用pandas read_excel 文档中的以下参数：

skiprows : list-like
Rows to skip at the beginning (0-indexed)
parse_cols : int or list, default None
If None then parse all columns,
If int then indicates last column to be parsed
If list of ints then indicates list of column numbers to be parsed
If string then indicates comma separated list of column names and column ranges (e.g. “A:E” or “A,C,E:F”)

skiprows ：类似列表
在开头跳过的行（0 索引）
parse_cols : 整数或列表，默认无
如果 None 则解析所有列，
如果 int 则表示要解析的最后一列
如果整数列表则指示要解析的列号列表
如果字符串则表示列名和列范围的逗号分隔列表（例如“A:E”或“A,C,E:F”）

I imagine the call will look like:

我想电话会是这样的：

df = read_excel(filename, 'Sheet2', skiprows = 2, parse_cols = 'A:D')

Answer 3

回答by ddnsimplon

my answer with pandas O.25 tested and worked well

我对 Pandas O.25 的回答经过测试并且运行良好

pd.read_excel('resultat-elections-2012.xls', sheet_name = 'France entière T1T2', skiprows = 2,  nrows= 5, usecols = 'A:H')
pd.read_excel('resultat-elections-2012.xls', index_col = None, skiprows= 2, nrows= 5, sheet_name='France entière T1T2', usecols=range(0,8))

So : i need data after two first lines ; selected desired lines (5) and col A to H.
Be carefull @shane answer's need to be improved and updated with the new parameters of Pandas

所以：我需要两行后的数据；选择所需的行 (5) 和 col A 到 H。
小心@shane 答案需要改进和更新 Pandas 的新参数

my original excel

my process with pandas read_excel

Python Pandas 数据框读取 Excel 工作表中的精确指定范围

提问by spiff

采纳答案by ???S???

回答by shane

回答by ddnsimplon

相关推荐

最近更新

标签

Python Pandas 数据框读取 Excel 工作表中的精确指定范围

提问by spiff

采纳答案by ???S???

回答by shane

回答by ddnsimplon

相关推荐

如何在 ipython 提示中显示当前目录

在 Python 中一起使用 IF、AND、OR 和 EQUAL 操作数

Python 请求。403 禁地

强制请求库在 Python 中使用 TLSv1.1 或 TLSv1.2

相关推荐

最近更新

标签