当第一列为空时，Pandas 读取带有多个标题的 Excel 表

Question

提问by Alexandra Espichán

I have an excel sheet like this:

我有一个这样的excel表：

I want to read it with pandas read_excel and I tried this:

我想用 pandas read_excel 阅读它，我试过这个：

df = pd.read_excel("test.xlsx", header=[0,1])

but it throws me this error:

但它给我带来了这个错误：

ParserError: Passed header=[0,1] are too many rows for this multi_index of columns

ParserError: Passed header=[0,1] 对于这个 multi_index 的列来说行太多了

Any suggestions?

有什么建议？

Answer 1

回答by Orenshi

If you don't mind massaging the DataFrame after reading the Excel you can try the below two ways:

如果您不介意在阅读 Excel 后按摩 DataFrame，您可以尝试以下两种方法：

>>> pd.read_excel("/tmp/sample.xlsx", usecols = "B:F", skiprows=[0])
  header1 Unnamed: 1 Unnamed: 2 header2 Unnamed: 4
0    col1       col2       col3    col4       col5
1       a          0          x       3          d
2       b          1          y       4          e
3       c          2          z       5          f

In above, you'd have to fix the first level of the MultiIndex since header1 and header2 are merged cells

在上面，您必须修复 MultiIndex 的第一级，因为 header1 和 header2 是合并的单元格

>>> pd.read_excel("/tmp/sample.xlsx", header=[0,1], usecols = "B:F", 
skiprows=[0])
        header1      header2
header1    col1 col2    col3 col4
a             0    x       3    d
b             1    y       4    e
c             2    z       5    f

In above, it got pretty close by skipping the empty row and parsing only columns (B:F) with data. If you notice, the columns got shifted though...

在上面，通过跳过空行并仅解析包含数据的列 (B:F)，它非常接近。如果你注意到了，虽然列已经移动了......

NoteNot a clean solution but just wanted to share samples with you in a post rather than a comment

注意不是一个干净的解决方案，只是想在帖子而不是评论中与您分享示例

-- Edit based on discussion with OP --

-- 根据与 OP 的讨论进行编辑 --

Based on documentation for pandas read_excel, header[1,2]is creating a MultiIndex for your columns. Looks like it determines the labels for the DataFramedepending on what is populated in Column A. Since there's nothing there... the index has a bunch of Nanlike so

根据 pandas read_excel 的文档，header[1,2]正在为您的列创建一个 MultiIndex。看起来它DataFrame根据列 A 中填充的内容确定标签。因为那里什么都没有......索引有一堆Nan这样的

>>> pd.read_excel("/tmp/sample.xlsx", header=[1,2])
    header1           header2
       col1 col2 col3    col4 col5
NaN       a    0    x       3    d
NaN       b    1    y       4    e
NaN       c    2    z       5    f

Again if you're okay with cleaning up columns and if the first column of the xlsx is always blank... you can drop it like below. Hopefully this is what you're looking for.

同样，如果您可以清理列并且 xlsx 的第一列始终为空白……您可以将其删除，如下所示。希望这就是你正在寻找的。

>>> pd.read_excel("/tmp/sample.xlsx", header[1,2]).reset_index().drop(['index'], level=0, axis=1)
  header1           header2
     col1 col2 col3    col4 col5
0       a    0    x       3    d
1       b    1    y       4    e
2       c    2    z       5    f

Answer 2

回答by BallpointBen

Hereis the documentation on the headerparameter:

这是有关header参数的文档：

Row (0-indexed) to use for the column labels of the parsed DataFrame. If a list of integers is passed those row positions will be combined into a MultiIndex. Use None if there is no header.

用于解析的 DataFrame 的列标签的行（0 索引）。如果传递一个整数列表，这些行位置将被组合成一个 MultiIndex。如果没有标题，请使用 None。

I think the following should work:

我认为以下应该有效：

df = pd.read_excel("test.xlsx", skiprows=2, usecols='B:F', header=0)

当第一列为空时，Pandas 读取带有多个标题的 Excel 表

提问by Alexandra Espichán

回答by Orenshi

回答by BallpointBen

相关推荐

最近更新

标签

当第一列为空时，Pandas 读取带有多个标题的 Excel 表

提问by Alexandra Espichán

回答by Orenshi

回答by BallpointBen

相关推荐

Pandas：将列中的列表拆分为多行

pandas 在python 3.7中安装pandas

数据框中特定行的总和（Pandas）

将 Json 文件转换为 Pandas 数据框

相关推荐

最近更新

标签