pandas 如何更改python数据框中的标题行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/33540961/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:10:28  来源:igfitidea点击:

How to change header row in a python dataframe

pythonpandas

提问by Oaka13

I'm having trouble changing the header row in an existing DataFrame using pandas in python. After importing pandas and the csv file I set a header row as None in order to be able to remove duplicate dates after transposing. However this leaves me with a row header (and in fact an index column) that I do not want.

我在使用 python 中的 Pandas 更改现有 DataFrame 中的标题行时遇到问题。导入Pandas和 csv 文件后,我将标题行设置为 None 以便能够在转置后删除重复的日期。然而,这给我留下了一个我不想要的行标题(实际上是一个索引列)。

df = pd.read_csv(spreadfile, header=None)

df2 = df.T.drop_duplicates([0], take_last=True)
del df2[1]

indcol = df2.ix[:,0]
df3 = df2.reindex(indcol)

The above unimaginative code however fails on two counts. The index column is now the required one however all entries are now NaN. My understanding of python is not yet good enough to recognise what python is doing. The desired output below is what I need, any help would be greatly appreciated!

然而,上述缺乏想象力的代码在两个方面失败了。索引列现在是必需的,但是所有条目现在都是 NaN。我对python的理解还不够好,无法识别python在做什么。下面所需的输出是我需要的,任何帮助将不胜感激!

df2 before reindexing:

重新索引前的 df2:

     0             2             3             4             5
0        NaN  XS0089553282  XS0089773484  XS0092157600  XS0092541969
1  01-May-14         131.7         165.1         151.8          88.9
3  02-May-14           131         164.9         151.7          88.5
5  05-May-14         131.1           165         151.8          88.6
7  06-May-14         129.9         163.4         151.2          87.1

df2 after reindexing:

重新索引后的 df2:

             0    2    3    4    5
0                                 
NaN        NaN  NaN  NaN  NaN  NaN
01-May-14  NaN  NaN  NaN  NaN  NaN
02-May-14  NaN  NaN  NaN  NaN  NaN
05-May-14  NaN  NaN  NaN  NaN  NaN
06-May-14  NaN  NaN  NaN  NaN  NaN

df2 desired:

想要的 df2:

       XS0089553282  XS0089773484  XS0092157600  XS0092541969
01-May-14         131.7         165.1         151.8          88.9
02-May-14           131         164.9         151.7          88.5
05-May-14         131.1           165         151.8          88.6
06-May-14         129.9         163.4         151.2          87.1

采纳答案by EdChum

Assign the columns directly:

直接分配列:

indcol = df2.ix[:,0]
df2.columns = indcol

The problem with reindexis it'll use the existing index and column values of your df, so your passed in new column values don't exist, hence why you get all NaNs

问题reindex在于它将使用 df 的现有索引和列值,因此您传入的新列值不存在,因此为什么您得到所有NaNs

A simpler approach to what you're trying to do:

您尝试执行的操作的更简单方法:

In [147]:
# take the cols and index values of interest
cols = df.loc[0, '2':]
idx = df['0'].iloc[1:]
print(cols)
print(idx)

2    XS0089553282
3    XS0089773484
4    XS0092157600
5    XS0092541969
Name: 0, dtype: object

1    01-May-14
3    02-May-14
5    05-May-14
7    06-May-14
Name: 0, dtype: object

In [157]:
# drop the first row and the first column
df2 = df.drop('0', axis=1).drop(0)
# overwrite the index values
df2.index = idx.values
df2

Out[157]:
               2      3      4     5
01-May-14  131.7  165.1  151.8  88.9
02-May-14    131  164.9  151.7  88.5
05-May-14  131.1    165  151.8  88.6
06-May-14  129.9  163.4  151.2  87.1

In [158]:
# now overwrite the column values    
df2.columns = cols.values
df2

Out[158]:
          XS0089553282 XS0089773484 XS0092157600 XS0092541969
01-May-14        131.7        165.1        151.8         88.9
02-May-14          131        164.9        151.7         88.5
05-May-14        131.1          165        151.8         88.6
06-May-14        129.9        163.4        151.2         87.1

回答by Nader Hisham

In [310]:
cols = df.iloc[0 , 1:]
cols
Out[310]:
1    XS0089553282
2    XS0089773484
3    XS0092157600
4    XS0092541969
Name: 0, dtype: object

In [311]:
df.drop(0 , inplace=True)
df
Out[311]:
           0    1       2          3    4
1   01-May-14   131.7   165.1   151.8   88.9
2   02-May-14   131     164.9   151.7   88.5
3   05-May-14   131.1   165     151.8   88.6
4   06-May-14   129.9   163.4   151.2   87.1

In [312]:
df.set_index(0 , inplace=True)
df

Out[312]:
    0           1   2           3   4       
01-May-14   131.7   165.1   151.8   88.9
02-May-14   131     164.9   151.7   88.5
05-May-14   131.1   165     151.8   88.6
06-May-14   129.9   163.4   151.2   87.1

In [315]:

df
df.columns = cols
df
Out[315]:
            XS0089553282    XS0089773484    XS0092157600    XS0092541969                
01-May-14   131.7                  165.1    151.8           88.9
02-May-14   131                    164.9    151.7           88.5
05-May-14   131.1                    165    151.8           88.6
06-May-14   129.9                  163.4    151.2           87.1