Pandas:从具有特定值的行下方开始读取 Excel 文件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/49876077/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas: reading Excel file starting from the row below that with a specific value
提问by FaCoffee
Say I have the following Excel file:
假设我有以下 Excel 文件:
A B C
0 - - -
1 Start - -
2 3 2 4
3 7 8 4
4 11 2 17
I want to read the file in a dataframe making sure that I start to read it below the rowwhere the Startvalue is.
我想读取数据框中的文件,确保我开始在值所在的行下方读取它Start。
Attention: the Startvalue is not always located in the same row, so if I were to use:
注意:该Start值并不总是位于同一行,所以如果我要使用:
import pandas as pd
xls = pd.ExcelFile('C:\Users\MyFolder\MyFile.xlsx')
df = xls.parse('Sheet1', skiprows=4, index_col=None)
this would fail as skiprowsneeds to be fixed. Is there any workaround to make sure that xls.parsefinds the string value instead of the row number?
这将失败,因为skiprows需要修复。是否有任何解决方法可以确保xls.parse找到字符串值而不是行号?
回答by Abhijit Ghate
df = pd.read_excel('your/path/filename')
Thisanswer helps in finding the location of 'start' in the df
此答案有助于在 df 中找到“开始”的位置
for row in range(df.shape[0]):
for col in range(df.shape[1]):
if df.iat[row,col] == 'start':
row_start = row
break
after having row_start you can use subframe of pandas
有了 row_start 后,您可以使用Pandas的子帧
df_required = df.loc[row_start:]
And if you don't need the row containing 'start', just u increment row_start by 1
如果您不需要包含“start”的行,只需将 row_start 增加 1
df_required = df.loc[row_start+1:]
回答by Maxoz99
You could use pd.read_excel('C:\Users\MyFolder\MyFile.xlsx', sheetname='Sheet1')as it ignores empty excel cells.
您可以使用pd.read_excel('C:\Users\MyFolder\MyFile.xlsx', sheetname='Sheet1')它,因为它会忽略空的 excel 单元格。
Your DataFrame should then look like this:
您的 DataFrame 应如下所示:
A B C
0 Start NaN NaN
1 3 2 4
2 7 8 4
3 11 2 17
Then drop the first row by using
然后使用删除第一行
df.drop([0])
to get
要得到
A B C
0 3 2 4
1 7 8 4
2 11 2 17

