pandas 重命名多索引数据框中的索引值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/20529619/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Renaming index values in multiindex dataframe
提问by tnknepp
Creating my dataframe:
创建我的数据框:
from pandas import *
arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = zip(*arrays)
index = MultiIndex.from_tuples(tuples, names=['first','second'])
data = DataFrame(randn(8,2),index=index,columns=['c1','c2'])
data
Out[68]:
c1 c2
first second
bar one 0.833816 -1.529639
two 0.340150 -1.818052
baz one -1.605051 -0.917619
two -0.021386 -0.222951
foo one 0.143949 -0.406376
two 1.208358 -2.469746
qux one -0.345265 -0.505282
two 0.158928 1.088826
I would like to rename the "first" index values, such as "bar"->"cat", "baz"->"dog", etc. However, every example I have read either operates on a single-level index and/or loops through the entire index to effectively re-create it from scratch. I was thinking something like:
我想重命名“第一个”索引值,例如“bar”->“cat”、“baz”->“dog”等。但是,我读过的每个示例要么对单级索引进行操作,要么/ 或遍历整个索引以有效地从头开始重新创建它。我在想这样的事情:
data = data.reindex(index={'bar':'cat','baz':'dog'})
but this does not work, nor do I really expect it to work on multiple indexes. Can I do such a replacement without looping through the entire dataframe index?
但这不起作用,我也不真正期望它适用于多个索引。我可以在不遍历整个数据帧索引的情况下进行这样的替换吗?
Begin edit
开始编辑
I am hesitant to update to 0.13 until release, so I used the following workaround:
我犹豫要不要更新到 0.13 直到发布,所以我使用了以下解决方法:
index = data.index.tolist()
for r in xrange( len(index) ):
index[r] = (codes[index[r][0]],index[r][1])
index = pd.MultiIndex.from_tuples(index,names=data.index.names)
data.index = index
Where is a previous defined dictionary of code:string pairs. This actually isn't as big of a performance his as I was expecting (takes a couple seconds to operate over ~1.1 million rows). It is not as pretty as a one-liner, but it does work.
以前定义的代码字典在哪里:字符串对。这实际上并没有我预期的那么大(需要几秒钟来操作约 110 万行)。它不像单线那么漂亮,但它确实有效。
End Edit
结束编辑
回答by unutbu
Use the set_levelsmethod (new in version 0.13.0):
使用set_levels方法(新版本 0.13.0):
data.index.set_levels([[u'cat', u'dog', u'foo', u'qux'],
[u'one', u'two']], inplace=True)
yields
产量
c1 c2
first second
cat one -0.289649 -0.870716
two -0.062014 -0.410274
dog one 0.030171 -1.091150
two 0.505408 1.531108
foo one 1.375653 -1.377876
two -1.478615 1.351428
qux one 1.075802 0.532416
two 0.865931 -0.765292
To remap a level based on a dict, you could use a function such as this:
要根据字典重新映射级别,您可以使用如下函数:
def map_level(df, dct, level=0):
index = df.index
index.set_levels([[dct.get(item, item) for item in names] if i==level else names
for i, names in enumerate(index.levels)], inplace=True)
dct = {'bar':'cat', 'baz':'dog'}
map_level(data, dct, level=0)
Here's a runnable example:
这是一个可运行的示例:
import numpy as np
import pandas as pd
arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = zip(*arrays)
index = pd.MultiIndex.from_tuples(tuples, names=['first','second'])
data = pd.DataFrame(np.random.randn(8,2),index=index,columns=['c1','c2'])
data2 = data.copy()
data.index.set_levels([[u'cat', u'dog', u'foo', u'qux'],
[u'one', u'two']], inplace=True)
print(data)
# c1 c2
# first second
# cat one 0.939040 -0.748100
# two -0.497006 -1.185966
# dog one -0.368161 0.050339
# two -2.356879 -0.291206
# foo one -0.556261 0.474297
# two 0.647973 0.755983
# qux one -0.017722 1.364244
# two 1.007303 0.004337
def map_level(df, dct, level=0):
index = df.index
index.set_levels([[dct.get(item, item) for item in names] if i==level else names
for i, names in enumerate(index.levels)], inplace=True)
dct = {'bar':'wolf', 'baz':'rabbit'}
map_level(data2, dct, level=0)
print(data2)
# c1 c2
# first second
# wolf one 0.939040 -0.748100
# two -0.497006 -1.185966
# rabbit one -0.368161 0.050339
# two -2.356879 -0.291206
# foo one -0.556261 0.474297
# two 0.647973 0.755983
# qux one -0.017722 1.364244
# two 1.007303 0.004337
回答by AlexG
The set_levelsmethod was causing my new column names to be out of order. So I found a different solution that isn't very clean, but works well. The method is to print df.index(or equivalently df.columns) and then copy and paste the output with the desired values changed. For example:
该set_levels方法导致我的新列名乱序。所以我找到了一个不同的解决方案,它不是很干净,但效果很好。该方法是print df.index(或等效地df.columns)然后复制并粘贴更改了所需值的输出。例如:
print data.index
MultiIndex(levels=[['bar', 'baz', 'foo', 'qux'], ['one', 'two']], labels=[[0, 0, 1, 1, 2, 2, 3, 3], [0, 1, 0, 1, 0, 1, 0, 1]], names=['first', 'second'])
MultiIndex(levels=[['bar', 'baz', 'foo', 'qux'], ['one', 'two']], 标签=[[0, 0, 1, 1, 2, 2, 3, 3], [0, 1, 0, 1, 0, 1, 0, 1]], names=['first', 'second'])
data.index = MultiIndex(levels=[['new_bar', 'new_baz', 'new_foo', 'new_qux'],
['new_one', 'new_two']],
labels=[[0, 0, 1, 1, 2, 2, 3, 3], [0, 1, 0, 1, 0, 1, 0, 1]],
names=['first', 'second'])
We can have full control over names by editing the labels as well. For example:
我们也可以通过编辑标签来完全控制名称。例如:
data.index = MultiIndex(levels=[['bar', 'baz', 'foo', 'qux'],
['one', 'twooo', 'three', 'four',
'five', 'siz', 'seven', 'eit']],
labels=[[0, 0, 1, 1, 2, 2, 3, 3], [0, 1, 2, 3, 4, 5, 6, 7]],
names=['first', 'second'])
Note that in this example we have already done something like from pandas import MultiIndexor from pandas import *.
请注意,在此示例中,我们已经完成了类似from pandas import MultiIndex或 的操作from pandas import *。

