Pandas:从具有特定值的行下方开始读取 Excel 文件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/49876077/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas: reading Excel file starting from the row below that with a specific value
提问by FaCoffee
Say I have the following Excel file:
假设我有以下 Excel 文件:
A B C
0 - - -
1 Start - -
2 3 2 4
3 7 8 4
4 11 2 17
I want to read the file in a dataframe making sure that I start to read it below the rowwhere the Start
value is.
我想读取数据框中的文件,确保我开始在值所在的行下方读取它Start
。
Attention: the Start
value is not always located in the same row, so if I were to use:
注意:该Start
值并不总是位于同一行,所以如果我要使用:
import pandas as pd
xls = pd.ExcelFile('C:\Users\MyFolder\MyFile.xlsx')
df = xls.parse('Sheet1', skiprows=4, index_col=None)
this would fail as skiprows
needs to be fixed. Is there any workaround to make sure that xls.parse
finds the string value instead of the row number?
这将失败,因为skiprows
需要修复。是否有任何解决方法可以确保xls.parse
找到字符串值而不是行号?
回答by Abhijit Ghate
df = pd.read_excel('your/path/filename')
Thisanswer helps in finding the location of 'start' in the df
此答案有助于在 df 中找到“开始”的位置
for row in range(df.shape[0]):
for col in range(df.shape[1]):
if df.iat[row,col] == 'start':
row_start = row
break
after having row_start you can use subframe of pandas
有了 row_start 后,您可以使用Pandas的子帧
df_required = df.loc[row_start:]
And if you don't need the row containing 'start', just u increment row_start by 1
如果您不需要包含“start”的行,只需将 row_start 增加 1
df_required = df.loc[row_start+1:]
回答by Maxoz99
You could use pd.read_excel('C:\Users\MyFolder\MyFile.xlsx', sheetname='Sheet1')
as it ignores empty excel cells.
您可以使用pd.read_excel('C:\Users\MyFolder\MyFile.xlsx', sheetname='Sheet1')
它,因为它会忽略空的 excel 单元格。
Your DataFrame should then look like this:
您的 DataFrame 应如下所示:
A B C
0 Start NaN NaN
1 3 2 4
2 7 8 4
3 11 2 17
Then drop the first row by using
然后使用删除第一行
df.drop([0])
to get
要得到
A B C
0 3 2 4
1 7 8 4
2 11 2 17