pandas 大熊猫读取具有空白顶行和左列的 Excel 文件时出现的问题

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/40026020/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:12:17  来源:igfitidea点击:

Problems when pandas reading Excel file that has blank top row and left column

pythonpython-3.xpandas

提问by Nicholas

I tried to read an Excel file that looks like below, enter image description here

我试图读取如下所示的 Excel 文件, 在此处输入图片说明

I was using pandas like this

我正在使用这样的Pandas

xls = pd.ExcelFile(file_path)
assets = xls.parse(sheetname="Sheet1", header=1, index_col=1)

But I got error

但我有错误

ValueError: Expected 4 fields in line 3, saw 5

ValueError:第 3 行中应有 4 个字段,看到 5 个

I also tried

我也试过

assets = xls.parse(sheetname="Sheet1", header=1, index_col=1, parse_cols="B:E")

But I got misparsed result as follows

但我得到了如下的错误解析结果

enter image description here

在此处输入图片说明

Then tried

然后试过

assets = xls.parse(sheetname="Sheet1", header=1, index_col=0, parse_cols="B:E")

Finally works, but why index_col=0 and parse_cols="B:E"? This makes me confused becasue based on pandas documents, assets = xls.parse(sheetname="Sheet1", header=1, index_col=1)should just be fine. Have I missed something?

终于成功了,但为什么 index_col=0 和 parse_cols="B:E"?这让我很困惑,因为基于Pandas文件assets = xls.parse(sheetname="Sheet1", header=1, index_col=1)应该没问题。我错过了什么吗?

采纳答案by Romain

The read_exceldocumentationis not clear on a point.

read_excel文档在某一点上不清楚。

  • skiprows=1to skip the first empty row at the top of the file or header=1also works to use the second row has column index.
  • parse_cols='B:E'is a way to skip the first empty column at the left of the file
  • index_col=0is optional and permits to define the first parsed column (B in this example) as the DataFrameindex. The mistake is here since index_colis relative to columns selected though the parse_colsparameter.
  • skiprows=1跳过文件顶部的第一个空行,或者header=1也可以使用具有列索引的第二行。
  • parse_cols='B:E'是一种跳过文件左侧第一个空列的方法
  • index_col=0是可选的,允许将第一个解析的列(在本例中为 B)定义为DataFrame索引。错误就在这里,因为index_col它与通过parse_cols参数选择的列有关。

With your example, you can use the following code

对于您的示例,您可以使用以下代码

pd.read_excel('test.xls', sheetname='Sheet1', skiprows=1, 
              parse_cols='B:E', index_col=0)

#           AA  BB  CC
# 10/13/16   1  12  -1
# 10/14/16   3  12  -2
# 10/15/16   5  12  -3
# 10/16/16   3  12  -4
# 10/17/16   5  23  -5