pandas 如何阅读 Excel 工作簿（熊猫）

Question

提问by R. Nero

First I want to say that I am not an expert by any means. I am versed but carry a burden of schedule and learning Python like I should have at a younger age!

首先我想说我无论如何都不是专家。我很精通但是背负着日程安排和学习 Python 的负担，就像我年轻时应该做的一样！

Question:
I have a workbook that will on occasion have more than one worksheet. When reading in the workbook I will not know the number of sheets or their sheet name. The data arrangement will be the same on every sheet with some columns going by the name of 'Unnamed'. The problem is that everything I try or find online uses the pandas.ExcelFile to gather all sheets which is fine but i need to be able to skips 4 rows and only read 42 rows after that and parse specific columns. Although the sheets might have the exact same structure the column names might be the same or different but would like them to be merged.

问题：
我有一本工作簿，有时会有多个工作表。在工作簿中阅读时，我不知道工作表的数量或工作表的名称。每张工作表上的数据排列都相同，有些列的名称为“未命名”。问题是我在网上尝试或找到的所有内容都使用 pandas.ExcelFile 来收集所有工作表，这很好，但我需要能够跳过 4 行，然后只读取 42 行并解析特定列。尽管工作表可能具有完全相同的结构，但列名可能相同或不同，但希望将它们合并。

So here is what I have:

所以这就是我所拥有的：

import pandas as pd
from openpyxl import load_workbook

# Load in the file location and name
cause_effect_file = r'C:\Users\Owner\Desktop\C&E Template.xlsx'

# Set up the ability to write dataframe to the same workbook
book = load_workbook(cause_effect_file)
writer = pd.ExcelWriter(cause_effect_file) 
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)

# Get the file skip rows and parse columns needed
xl_file = pd.read_excel(cause_effect_file, skiprows=4, parse_cols = 'B:AJ', na_values=['NA'], convert_float=False)

# Loop through the sheets loading data in the dataframe
dfi = {sheet_name: xl_file.parse(sheet_name)
          for sheet_name in xl_file.sheet_names}

# Remove columns labeled as un-named
for col in dfi:
    if r'Unnamed' in col:
        del dfi[col]

# Write dataframe to sheet so we can see what the data looks like
dfi.to_excel(writer, "PyDF", index=False)

# Save it back to the book
writer.save()

The link to the file i am working with is below Excel File

我正在使用的文件的链接位于Excel 文件下方

Answer 1

采纳答案by user29791

Try to modify the following based on your specific need:

尝试根据您的特定需要修改以下内容：

import os
import pandas as pd

df = pd.DataFrame()
xls = pd.ExcelFile(path)

Then iterate over all the available data sheets:

然后遍历所有可用的数据表：

for x in range(0, len(xls.sheet_names)): 
    a = xls.parse(x,header = 4, parse_cols = 'B:AJ')
    a["Sheet Name"] = [xls.sheet_names[x]] * len(a)
    df = df.append(a)

You can adjust the header row and the columns to read for each sheet. I added a column that will indicate the name of the data sheet the row came from.

您可以调整每个工作表的标题行和列。我添加了一个列，该列将指示该行来自的数据表的名称。

Answer 2

回答by Charlie Clark

You probably want to look at using read_onlymode in openpyxl. This will allow you to load only those sheets that you're interested and look at only the cells you're interested in.

你可能想看看read_only在 openpyxl中使用模式。这将允许您仅加载您感兴趣的工作表并仅查看您感兴趣的单元格。

If you want to work with Pandas dataframes then you'll have to create these yourself but that shouldn't be too hard.

如果你想使用 Pandas 数据框，那么你必须自己创建这些，但这应该不会太难。

pandas 如何阅读 Excel 工作簿（熊猫）

提问by R. Nero

采纳答案by user29791

回答by Charlie Clark

相关推荐

最近更新

标签

pandas 如何阅读 Excel 工作簿（熊猫）

提问by R. Nero

采纳答案by user29791

回答by Charlie Clark

相关推荐

pandas 熊猫如何将所有字符串值转换为浮点数

pandas 使用python pandas将一列拆分为多列

Pandas - 手动创建数据框并插入值

pandas Python：降低精度熊猫时间戳数据帧

相关推荐

最近更新

标签