从第 5 行开始读取 excel 到 python 数据框并包括标题
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17548669/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
reading excel to a python data frame starting from row 5 and including headers
提问by IcemanBerlin
how do I import excel data into a dataframe in python.
如何将excel数据导入python中的数据框。
Basically the current excel workbook runs some vba on opening which refreshes a pivot table and does some other stuff.
基本上,当前的 excel 工作簿在打开时会运行一些 vba,它会刷新数据透视表并执行一些其他操作。
Then I wish to import the results of the pivot table refresh into a dataframe in python for further analysis.
然后我希望将数据透视表刷新的结果导入 python 中的数据帧以进行进一步分析。
import xlrd
wb = xlrd.open_workbook('C:\Users\cb\Machine_Learning\cMap_Joins.xlsm')
#sheetnames
print wb.sheet_names()
#number of sheets
print wb.nsheets
The refreshing and opening of the file works fine. But how do i select the data from the first sheet from say row 5 including header down to last record n.
文件的刷新和打开工作正常。但是我如何从第 5 行的第一张工作表中选择数据,包括标题到最后一条记录 n。
采纳答案by Andy Hayden
You can use pandas' ExcelFile parse
method to read Excel sheets, see io docs:
您可以使用 pandas 的 ExcelFileparse
方法来读取 Excel 表格,请参阅io docs:
xls = pd.ExcelFile('C:\Users\cb\Machine_Learning\cMap_Joins.xlsm')
df = xls.parse('Sheet1', skiprows=4, index_col=None, na_values=['NA'])
skiprows
will ignore the first 4 rows (i.e. start at row index 4), and several other options.
回答by rrawat
The accepted answer is old (as discussed in comments of the accepted answer). Now the preferred option is using pd.read_excel()
接受的答案是旧的(如已接受答案的评论中所述)。现在首选的选项是使用pd.read_excel()