从第 5 行开始读取 excel 到 python 数据框并包括标题

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/17548669/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 08:29:41  来源:igfitidea点击:

reading excel to a python data frame starting from row 5 and including headers

pythonexcelimportpandas

提问by IcemanBerlin

how do I import excel data into a dataframe in python.

如何将excel数据导入python中的数据框。

Basically the current excel workbook runs some vba on opening which refreshes a pivot table and does some other stuff.

基本上,当前的 excel 工作簿在打开时会运行一些 vba,它会刷新数据透视表并执行一些其他操作。

Then I wish to import the results of the pivot table refresh into a dataframe in python for further analysis.

然后我希望将数据透视表刷新的结果导入 python 中的数据帧以进行进一步分析。

import xlrd

wb = xlrd.open_workbook('C:\Users\cb\Machine_Learning\cMap_Joins.xlsm')

#sheetnames
print wb.sheet_names()

#number of sheets
print wb.nsheets

The refreshing and opening of the file works fine. But how do i select the data from the first sheet from say row 5 including header down to last record n.

文件的刷新和打开工作正常。但是我如何从第 5 行的第一张工作表中选择数据,包括标题到最后一条记录 n。

采纳答案by Andy Hayden

You can use pandas' ExcelFile parsemethod to read Excel sheets, see io docs:

您可以使用 pandas 的 ExcelFileparse方法来读取 Excel 表格,请参阅io docs

xls = pd.ExcelFile('C:\Users\cb\Machine_Learning\cMap_Joins.xlsm')

df = xls.parse('Sheet1', skiprows=4, index_col=None, na_values=['NA'])

skiprowswill ignore the first 4 rows (i.e. start at row index 4), and several other options.

skiprows将忽略前 4 行(即从行索引 4 开始)和其他几个选项

回答by rrawat

The accepted answer is old (as discussed in comments of the accepted answer). Now the preferred option is using pd.read_excel()

接受的答案是旧的(如已接受答案的评论中所述)。现在首选的选项是使用pd.read_excel()