pandas 将第一行与数据框中的列标题合并

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/46190263/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 04:26:20  来源:igfitidea点击:

Merge the first row with the column headers in a dataframe

pythonpandasdataframe

提问by Anna Jeanine

I am trying to clean up a Excel file for some further research. Problem that I have, I want to merge the first and second row. The code which I have now:

我正在尝试清理 Excel 文件以进行进一步研究。我遇到的问题,我想合并第一行和第二行。我现在拥有的代码:

xl = pd.ExcelFile("nanonose.xls")
df = xl.parse("Sheet1")
df = df.drop('Unnamed: 2', axis=1)
## Tried this line but no luck
##print(df.head().combine_first(df.iloc[[0]]))

The output of this is:

这个的输出是:

      Nanonose     Unnamed: 1     A     B    C          D          E  \
0  Sample type  Concentration   NaN   NaN  NaN        NaN        NaN   
1        Water           9200  95.5  21.0  6.0  11.942308  64.134615   
2        Water           9200  94.5  17.0  5.0   5.484615  63.205769   
3        Water           9200  92.0  16.0  3.0  11.057692  62.586538   
4        Water           4600  53.0   7.5  2.5   3.538462  35.163462   

           F         G         H  
0        NaN       NaN       NaN  
1  21.498560  5.567840  1.174135  
2  19.658560  4.968000  1.883444  
3  19.813120  5.192480  0.564835  
4   6.876207  1.641724  0.144654 

So, my goal is to merge the first and second row to get: Sample type | Concentration | A | B | C | D | E | F | G | H

所以,我的目标是合并第一行和第二行以获得:样本类型 | 浓度| 一个 | 乙 | C | D | E | F | G | H

Could someone help me merge these two rows?

有人可以帮我合并这两行吗?

采纳答案by jezrael

I think you need numpy.concatenate, similar principe like c???s????answer:

我认为你需要numpy.concatenate,像c???s??类似的原理回答:

df.columns = np.concatenate([df.iloc[0, :2], df.columns[2:]])
df = df.iloc[1:].reset_index(drop=True)
print (df)
  Sample type Concentration     A     B    C          D          E          F  \
0       Water          9200  95.5  21.0  6.0  11.942308  64.134615  21.498560   
1       Water          9200  94.5  17.0  5.0   5.484615  63.205769  19.658560   
2       Water          9200  92.0  16.0  3.0  11.057692  62.586538  19.813120   
3       Water          4600  53.0   7.5  2.5   3.538462  35.163462   6.876207   

          G         H  
0  5.567840  1.174135  
1  4.968000  1.883444  
2  5.192480  0.564835  
3  1.641724  0.144654  

回答by cs95

Just reassign df.columns.

只需重新分配df.columns

df.columns = np.append(df.iloc[0, :2], df.columns[2:])

Or,

或者,

df.columns = df.iloc[0, :2].tolist() + (df.columns[2:]).tolist()


Next, skip the first row.

接下来,跳过第一行。

df = df.iloc[1:].reset_index(drop=True) 
df
  Sample type Concentration     A     B    C          D          E          F  \
0       Water          9200  95.5  21.0  6.0  11.942308  64.134615  21.498560   
1       Water          9200  94.5  17.0  5.0   5.484615  63.205769  19.658560   
2       Water          9200  92.0  16.0  3.0  11.057692  62.586538  19.813120   
3       Water          4600  53.0   7.5  2.5   3.538462  35.163462   6.876207   

          G         H  
0  5.567840  1.174135  
1  4.968000  1.883444  
2  5.192480  0.564835  
3  1.641724  0.144654 

reset_indexis optional if you want a 0-index for your final output.

reset_index如果您希望最终输出为 0 索引,则是可选的。

回答by vsnahar

Fetch the all columns present in Second row header then First row header. combine them to make a "all columns name header" list. now create a df with excel by taking header as header[0,1]. now replace its headers with all column name headers you created previously.

获取第二行标题中存在的所有列,然后是第一行标题。将它们组合成一个“所有列名称标题”列表。现在通过将标题作为标题 [0,1] 创建一个带有 excel 的 df。现在用您之前创建的所有列名称标题替换其标题。

import pandas as pd

#reading Second header row columns
df1 = pd.read_excel('nanonose.xls', header=[1] , index = False)
cols1 = df1.columns.tolist()
SecondRowColumns = []
for c in cols1:
    if ("Unnamed" or "NaN" not in c):
        SecondRowColumns.append(c)     

#reading First header row columns
df2 = pd.read_excel('nanonose.xls', header=[0] , index = False)
cols2 = df2.columns.tolist()
FirstRowColumns = []
for c in cols2:
    if ("Unnamed" or "Nanonose" not in c):
        FirstRowColumns.append(c)       

AllColumn = []
AllColumn = SecondRowColumns+ FirstRowColumns



df = pd.read_excel('nanonose.xls', header=[0,1] , index=False)
df.columns = AllColumn
print(df)