Python 使用 Pandas 读取带有多个标题的 Excel 表格

Question

提问by muazfaiz

I have an excel sheet with multiple header like:

我有一个带有多个标题的 Excel 工作表，例如：

_________________________________________________________________________
____|_____|        Header1    |        Header2     |        Header3      |
ColX|ColY |ColA|ColB|ColC|ColD||ColD|ColE|ColF|ColG||ColH|ColI|ColJ|ColDK|
1   | ds  | 5  | 6  |9   |10  | .......................................
2   | dh  |  ..........................................................
3   | ge  |  ..........................................................
4   | ew  |  ..........................................................
5   | er  |  ..........................................................

Now here you can see that first two columns do not have headers they are blank but other columns have headers like Header1, Header2 and Header3. So I want to read this sheet and merge it with other sheet with similar structure.

现在在这里您可以看到前两列没有标题，它们是空白的，但其他列有标题，如 Header1、Header2 和 Header3。所以我想阅读这张表并将其与具有类似结构的其他表合并。

I want to merge it on first column 'ColX'. Right now I am doing this:

我想将它合并到第一列“ColX”上。现在我正在这样做：

import pandas as pd

totalMergedSheet = pd.DataFrame([1,2,3,4,5], columns=['ColX'])
file = pd.ExcelFile('ExcelFile.xlsx')
for i in range (1, len(file.sheet_names)):
    df1 = file.parse(file.sheet_names[i-1])
    df2 = file.parse(file.sheet_names[i])
    newMergedSheet = pd.merge(df1, df2, on='ColX')
    totalMergedSheet = pd.merge(totalMergedSheet, newMergedSheet, on='ColX')

But I don't know its neither reading columns correctly and I think will not return the results in the way I want. So, I want the resulting frame should be like:

但我不知道它既不正确阅读列，我认为不会以我想要的方式返回结果。所以，我希望结果帧应该是这样的：

________________________________________________________________________________________________________
____|_____|        Header1    |        Header2     |        Header3      |        Header4     |        Header5      |
ColX|ColY |ColA|ColB|ColC|ColD||ColD|ColE|ColF|ColG||ColH|ColI|ColJ|ColK| ColL|ColM|ColN|ColO||ColP|ColQ|ColR|ColS|
1   | ds  | 5  | 6  |9   |10  | ..................................................................................
2   | dh  |  ...................................................................................
3   | ge  |  ....................................................................................
4   | ew  |  ...................................................................................
5   | er  |  ......................................................................................

Any suggestions please. Thanks.

请提出任何建议。谢谢。

Answer 1

回答by beeftendon

Pandas already has a function that will read in an entire Excel spreadsheet for you, so you don't need to manually parse/merge each sheet. Take a look pandas.read_excel(). It not only lets you read in an Excel file in a single line, it also provides options to help solve the problem you're having.

Pandas 已经有一个功能可以为您读取整个 Excel 电子表格，因此您无需手动解析/合并每个工作表。看看pandas.read_excel()。它不仅让您在一行中读取 Excel 文件，还提供了帮助解决您遇到的问题的选项。

Since you have subcolumns, what you're looking for is MultiIndexing. By default, pandas will read in the top row as the sole header row. You can pass a headerargument into pandas.read_excel()that indicates how many rows are to be used as headers. In your particular case, you'd want header=[0, 1], indicating the first two rows. You might also have multiple sheets, so you can pass sheetname=Noneas well (this tells it to go through all sheets). The command would be:

由于您有子列，您正在寻找的是MultiIndexing。默认情况下，pandas 将读取顶行作为唯一的标题行。您可以将header参数传递给pandas.read_excel()指示要用作标题的行数。在您的特定情况下，您需要header=[0, 1]，指示前两行。您可能还有多个工作表，因此您也可以通过sheetname=None（这告诉它通过所有工作表）。命令将是：

df_dict = pandas.read_excel('ExcelFile.xlsx', header=[0, 1], sheetname=None)

This returns a dictionary where the keys are the sheet names, and the values are the DataFrames for each sheet. If you want to collapse it all into one DataFrame, you can simply use pandas.concat:

这将返回一个字典，其中键是工作表名称，值是每个工作表的数据帧。如果你想把它全部折叠成一个 DataFrame，你可以简单地使用 pandas.concat：

df = pandas.concat(df_dict.values(), axis=0)

Python 使用 Pandas 读取带有多个标题的 Excel 表格

提问by muazfaiz

回答by beeftendon

相关推荐

最近更新

标签

Python 使用 Pandas 读取带有多个标题的 Excel 表格

提问by muazfaiz

回答by beeftendon

相关推荐

Python 无法将 pip 升级到最新版本 9.0.1 (OS:ubuntu 16.04LTS)

Python ValueError：无法将输入数组从形状 (224,224,3) 广播到形状 (224,224)

如何在条形图（Python）上的条形上方书写文本？

Python 使用 BeautifulSoup 从 `img` 标签中提取 `src` 属性

相关推荐

最近更新

标签