pandas 如何更改python数据框中的标题行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/33540961/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to change header row in a python dataframe
提问by Oaka13
I'm having trouble changing the header row in an existing DataFrame using pandas in python. After importing pandas and the csv file I set a header row as None in order to be able to remove duplicate dates after transposing. However this leaves me with a row header (and in fact an index column) that I do not want.
我在使用 python 中的 Pandas 更改现有 DataFrame 中的标题行时遇到问题。导入Pandas和 csv 文件后,我将标题行设置为 None 以便能够在转置后删除重复的日期。然而,这给我留下了一个我不想要的行标题(实际上是一个索引列)。
df = pd.read_csv(spreadfile, header=None)
df2 = df.T.drop_duplicates([0], take_last=True)
del df2[1]
indcol = df2.ix[:,0]
df3 = df2.reindex(indcol)
The above unimaginative code however fails on two counts. The index column is now the required one however all entries are now NaN. My understanding of python is not yet good enough to recognise what python is doing. The desired output below is what I need, any help would be greatly appreciated!
然而,上述缺乏想象力的代码在两个方面失败了。索引列现在是必需的,但是所有条目现在都是 NaN。我对python的理解还不够好,无法识别python在做什么。下面所需的输出是我需要的,任何帮助将不胜感激!
df2 before reindexing:
重新索引前的 df2:
0 2 3 4 5
0 NaN XS0089553282 XS0089773484 XS0092157600 XS0092541969
1 01-May-14 131.7 165.1 151.8 88.9
3 02-May-14 131 164.9 151.7 88.5
5 05-May-14 131.1 165 151.8 88.6
7 06-May-14 129.9 163.4 151.2 87.1
df2 after reindexing:
重新索引后的 df2:
0 2 3 4 5
0
NaN NaN NaN NaN NaN NaN
01-May-14 NaN NaN NaN NaN NaN
02-May-14 NaN NaN NaN NaN NaN
05-May-14 NaN NaN NaN NaN NaN
06-May-14 NaN NaN NaN NaN NaN
df2 desired:
想要的 df2:
XS0089553282 XS0089773484 XS0092157600 XS0092541969
01-May-14 131.7 165.1 151.8 88.9
02-May-14 131 164.9 151.7 88.5
05-May-14 131.1 165 151.8 88.6
06-May-14 129.9 163.4 151.2 87.1
采纳答案by EdChum
Assign the columns directly:
直接分配列:
indcol = df2.ix[:,0]
df2.columns = indcol
The problem with reindex
is it'll use the existing index and column values of your df, so your passed in new column values don't exist, hence why you get all NaN
s
问题reindex
在于它将使用 df 的现有索引和列值,因此您传入的新列值不存在,因此为什么您得到所有NaN
s
A simpler approach to what you're trying to do:
您尝试执行的操作的更简单方法:
In [147]:
# take the cols and index values of interest
cols = df.loc[0, '2':]
idx = df['0'].iloc[1:]
print(cols)
print(idx)
2 XS0089553282
3 XS0089773484
4 XS0092157600
5 XS0092541969
Name: 0, dtype: object
1 01-May-14
3 02-May-14
5 05-May-14
7 06-May-14
Name: 0, dtype: object
In [157]:
# drop the first row and the first column
df2 = df.drop('0', axis=1).drop(0)
# overwrite the index values
df2.index = idx.values
df2
Out[157]:
2 3 4 5
01-May-14 131.7 165.1 151.8 88.9
02-May-14 131 164.9 151.7 88.5
05-May-14 131.1 165 151.8 88.6
06-May-14 129.9 163.4 151.2 87.1
In [158]:
# now overwrite the column values
df2.columns = cols.values
df2
Out[158]:
XS0089553282 XS0089773484 XS0092157600 XS0092541969
01-May-14 131.7 165.1 151.8 88.9
02-May-14 131 164.9 151.7 88.5
05-May-14 131.1 165 151.8 88.6
06-May-14 129.9 163.4 151.2 87.1
回答by Nader Hisham
In [310]:
cols = df.iloc[0 , 1:]
cols
Out[310]:
1 XS0089553282
2 XS0089773484
3 XS0092157600
4 XS0092541969
Name: 0, dtype: object
In [311]:
df.drop(0 , inplace=True)
df
Out[311]:
0 1 2 3 4
1 01-May-14 131.7 165.1 151.8 88.9
2 02-May-14 131 164.9 151.7 88.5
3 05-May-14 131.1 165 151.8 88.6
4 06-May-14 129.9 163.4 151.2 87.1
In [312]:
df.set_index(0 , inplace=True)
df
Out[312]:
0 1 2 3 4
01-May-14 131.7 165.1 151.8 88.9
02-May-14 131 164.9 151.7 88.5
05-May-14 131.1 165 151.8 88.6
06-May-14 129.9 163.4 151.2 87.1
In [315]:
df
df.columns = cols
df
Out[315]:
XS0089553282 XS0089773484 XS0092157600 XS0092541969
01-May-14 131.7 165.1 151.8 88.9
02-May-14 131 164.9 151.7 88.5
05-May-14 131.1 165 151.8 88.6
06-May-14 129.9 163.4 151.2 87.1