pandas 重命名熊猫数据框的索引
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/16591923/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
rename index of a pandas dataframe
提问by user1486457
I have a pandas dataframe whose indices look like:
我有一个Pandas数据框,其索引如下所示:
df.index
['a_1', 'b_2', 'c_3', ... ]
I want to rename these indices to:
我想将这些索引重命名为:
['a', 'b', 'c', ... ]
How do I do this without specifying a dictionary with explicit keys for each index value?
I tried:
如何在不为每个索引值指定带有显式键的字典的情况下执行此操作?
我试过:
df.rename( index = lambda x: x.split( '_' )[0] )
but this throws up an error:
但这会引发错误:
AssertionError: New axis must be unique to rename
回答by unutbu
Perhaps you could get the best of both worlds by using a MultiIndex:
也许您可以通过使用 MultiIndex 来两全其美:
import numpy as np
import pandas as pd
df = pd.DataFrame(np.arange(8).reshape(4,2), index=['a_1', 'b_2', 'c_3', 'c_4'])
print(df)
# 0 1
# a_1 0 1
# b_2 2 3
# c_3 4 5
# c_4 6 7
index = pd.MultiIndex.from_tuples([item.split('_') for item in df.index])
df.index = index
print(df)
# 0 1
# a 1 0 1
# b 2 2 3
# c 3 4 5
# 4 6 7
This way, you can access things according to first level of the index:
这样,您可以根据索引的第一级访问事物:
In [30]: df.ix['c']
Out[30]:
0 1
3 4 5
4 6 7
or according to both levels of the index:
或根据指数的两个级别:
In [31]: df.ix[('c','3')]
Out[31]:
0 4
1 5
Name: (c, 3)
Moreover, all the DataFrame methods are built to work with DataFrames with MultiIndices, so you lose nothing.
此外,所有 DataFrame 方法都构建为与带有 MultiIndices 的 DataFrame 一起使用,因此您不会丢失任何东西。
However, if you really want to drop the second level of the index, you could do this:
但是,如果您真的想删除索引的第二级,您可以这样做:
df.reset_index(level=1, drop=True, inplace=True)
print(df)
# 0 1
# a 0 1
# b 2 3
# c 4 5
# c 6 7
回答by DSM
That's the error you'd get if your function produced duplicate index values:
如果您的函数产生重复的索引值,则会出现以下错误:
>>> df = pd.DataFrame(np.random.random((4,3)),index="a_1 b_2 c_3 c_4".split())
>>> df
0 1 2
a_1 0.854839 0.830317 0.046283
b_2 0.433805 0.629118 0.702179
c_3 0.390390 0.374232 0.040998
c_4 0.667013 0.368870 0.637276
>>> df.rename(index=lambda x: x.split("_")[0])
[...]
AssertionError: New axis must be unique to rename
If you really want that, I'd use a list comp:
如果你真的想要那个,我会使用一个列表组合:
>>> df.index = [x.split("_")[0] for x in df.index]
>>> df
0 1 2
a 0.854839 0.830317 0.046283
b 0.433805 0.629118 0.702179
c 0.390390 0.374232 0.040998
c 0.667013 0.368870 0.637276
but I'd think about whether that's really the right direction.
但我会考虑这是否真的是正确的方向。

