pandas 大熊猫读取具有空白顶行和左列的 Excel 文件时出现的问题
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/40026020/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Problems when pandas reading Excel file that has blank top row and left column
提问by Nicholas
I tried to read an Excel file that looks like below,
I was using pandas like this
我正在使用这样的Pandas
xls = pd.ExcelFile(file_path)
assets = xls.parse(sheetname="Sheet1", header=1, index_col=1)
But I got error
但我有错误
ValueError: Expected 4 fields in line 3, saw 5
ValueError:第 3 行中应有 4 个字段,看到 5 个
I also tried
我也试过
assets = xls.parse(sheetname="Sheet1", header=1, index_col=1, parse_cols="B:E")
But I got misparsed result as follows
但我得到了如下的错误解析结果
Then tried
然后试过
assets = xls.parse(sheetname="Sheet1", header=1, index_col=0, parse_cols="B:E")
Finally works, but why index_col=0 and parse_cols="B:E"? This makes me confused becasue based on pandas documents, assets = xls.parse(sheetname="Sheet1", header=1, index_col=1)
should just be fine. Have I missed something?
终于成功了,但为什么 index_col=0 和 parse_cols="B:E"?这让我很困惑,因为基于Pandas文件,assets = xls.parse(sheetname="Sheet1", header=1, index_col=1)
应该没问题。我错过了什么吗?
采纳答案by Romain
The read_excel
documentationis not clear on a point.
该read_excel
文档在某一点上不清楚。
skiprows=1
to skip the first empty row at the top of the file orheader=1
also works to use the second row has column index.parse_cols='B:E'
is a way to skip the first empty column at the left of the fileindex_col=0
is optional and permits to define the first parsed column (B in this example) as theDataFrame
index. The mistake is here sinceindex_col
is relative to columns selected though theparse_cols
parameter.
skiprows=1
跳过文件顶部的第一个空行,或者header=1
也可以使用具有列索引的第二行。parse_cols='B:E'
是一种跳过文件左侧第一个空列的方法index_col=0
是可选的,允许将第一个解析的列(在本例中为 B)定义为DataFrame
索引。错误就在这里,因为index_col
它与通过parse_cols
参数选择的列有关。
With your example, you can use the following code
对于您的示例,您可以使用以下代码
pd.read_excel('test.xls', sheetname='Sheet1', skiprows=1,
parse_cols='B:E', index_col=0)
# AA BB CC
# 10/13/16 1 12 -1
# 10/14/16 3 12 -2
# 10/15/16 5 12 -3
# 10/16/16 3 12 -4
# 10/17/16 5 23 -5