pandas 重命名多索引数据框中的索引值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/20529619/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 21:26:17  来源:igfitidea点击:

Renaming index values in multiindex dataframe

python-2.7pandas

提问by tnknepp

Creating my dataframe:

创建我的数据框:

from pandas import *
arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
          ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]

tuples = zip(*arrays)

index = MultiIndex.from_tuples(tuples, names=['first','second'])
data = DataFrame(randn(8,2),index=index,columns=['c1','c2'])

data
Out[68]: 
                    c1        c2
first second                    
bar   one     0.833816 -1.529639
      two     0.340150 -1.818052
baz   one    -1.605051 -0.917619
      two    -0.021386 -0.222951
foo   one     0.143949 -0.406376
      two     1.208358 -2.469746
qux   one    -0.345265 -0.505282
      two     0.158928  1.088826

I would like to rename the "first" index values, such as "bar"->"cat", "baz"->"dog", etc. However, every example I have read either operates on a single-level index and/or loops through the entire index to effectively re-create it from scratch. I was thinking something like:

我想重命名“第一个”索引值,例如“bar”->“cat”、“baz”->“dog”等。但是,我读过的每个示例要么对单级索引进行操作,要么/ 或遍历整个索引以有效地从头开始重新创建它。我在想这样的事情:

data = data.reindex(index={'bar':'cat','baz':'dog'})

but this does not work, nor do I really expect it to work on multiple indexes. Can I do such a replacement without looping through the entire dataframe index?

但这不起作用,我也不真正期望它适用于多个索引。我可以在不遍历整个数据帧索引的情况下进行这样的替换吗?

Begin edit

开始编辑

I am hesitant to update to 0.13 until release, so I used the following workaround:

我犹豫要不要更新到 0.13 直到发布,所以我使用了以下解决方法:

index = data.index.tolist()
for r in xrange( len(index) ):
    index[r] = (codes[index[r][0]],index[r][1])

index = pd.MultiIndex.from_tuples(index,names=data.index.names)
data.index = index

Where is a previous defined dictionary of code:string pairs. This actually isn't as big of a performance his as I was expecting (takes a couple seconds to operate over ~1.1 million rows). It is not as pretty as a one-liner, but it does work.

以前定义的代码字典在哪里:字符串对。这实际上并没有我预期的那么大(需要几秒钟来操作约 110 万行)。它不像单线那么漂亮,但它确实有效。

End Edit

结束编辑

回答by unutbu

Use the set_levelsmethod (new in version 0.13.0):

使用set_levels方法(新版本 0.13.0):

data.index.set_levels([[u'cat', u'dog', u'foo', u'qux'], 
                       [u'one', u'two']], inplace=True)

yields

产量

                    c1        c2
first second                    
cat   one    -0.289649 -0.870716
      two    -0.062014 -0.410274
dog   one     0.030171 -1.091150
      two     0.505408  1.531108
foo   one     1.375653 -1.377876
      two    -1.478615  1.351428
qux   one     1.075802  0.532416
      two     0.865931 -0.765292


To remap a level based on a dict, you could use a function such as this:

要根据字典重新映射级别,您可以使用如下函数:

def map_level(df, dct, level=0):
    index = df.index
    index.set_levels([[dct.get(item, item) for item in names] if i==level else names
                      for i, names in enumerate(index.levels)], inplace=True)

dct = {'bar':'cat', 'baz':'dog'}
map_level(data, dct, level=0)


Here's a runnable example:

这是一个可运行的示例:

import numpy as np
import pandas as pd

arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
          ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = zip(*arrays)
index = pd.MultiIndex.from_tuples(tuples, names=['first','second'])
data = pd.DataFrame(np.random.randn(8,2),index=index,columns=['c1','c2'])
data2 = data.copy()

data.index.set_levels([[u'cat', u'dog', u'foo', u'qux'], 
                       [u'one', u'two']], inplace=True)
print(data)
#                     c1        c2
# first second                    
# cat   one     0.939040 -0.748100
#       two    -0.497006 -1.185966
# dog   one    -0.368161  0.050339
#       two    -2.356879 -0.291206
# foo   one    -0.556261  0.474297
#       two     0.647973  0.755983
# qux   one    -0.017722  1.364244
#       two     1.007303  0.004337

def map_level(df, dct, level=0):
    index = df.index
    index.set_levels([[dct.get(item, item) for item in names] if i==level else names
                      for i, names in enumerate(index.levels)], inplace=True)
dct = {'bar':'wolf', 'baz':'rabbit'}
map_level(data2, dct, level=0)
print(data2)
#                      c1        c2
# first  second                    
# wolf   one     0.939040 -0.748100
#        two    -0.497006 -1.185966
# rabbit one    -0.368161  0.050339
#        two    -2.356879 -0.291206
# foo    one    -0.556261  0.474297
#        two     0.647973  0.755983
# qux    one    -0.017722  1.364244
#        two     1.007303  0.004337

回答by AlexG

The set_levelsmethod was causing my new column names to be out of order. So I found a different solution that isn't very clean, but works well. The method is to print df.index(or equivalently df.columns) and then copy and paste the output with the desired values changed. For example:

set_levels方法导致我的新列名乱序。所以我找到了一个不同的解决方案,它不是很干净,但效果很好。该方法是print df.index(或等效地df.columns)然后复制并粘贴更改了所需值的输出。例如:

print data.index

MultiIndex(levels=[['bar', 'baz', 'foo', 'qux'], ['one', 'two']], labels=[[0, 0, 1, 1, 2, 2, 3, 3], [0, 1, 0, 1, 0, 1, 0, 1]], names=['first', 'second'])

MultiIndex(levels=[['bar', 'baz', 'foo', 'qux'], ['one', 'two']], 标签=[[0, 0, 1, 1, 2, 2, 3, 3], [0, 1, 0, 1, 0, 1, 0, 1]], names=['first', 'second'])

data.index = MultiIndex(levels=[['new_bar', 'new_baz', 'new_foo', 'new_qux'],
                                ['new_one', 'new_two']],
                        labels=[[0, 0, 1, 1, 2, 2, 3, 3], [0, 1, 0, 1, 0, 1, 0, 1]],
                        names=['first', 'second'])

We can have full control over names by editing the labels as well. For example:

我们也可以通过编辑标签来完全控制名称。例如:

data.index = MultiIndex(levels=[['bar', 'baz', 'foo', 'qux'],
                                ['one', 'twooo', 'three', 'four',
                                 'five', 'siz', 'seven', 'eit']],
                        labels=[[0, 0, 1, 1, 2, 2, 3, 3], [0, 1, 2, 3, 4, 5, 6, 7]],
                        names=['first', 'second'])

Note that in this example we have already done something like from pandas import MultiIndexor from pandas import *.

请注意,在此示例中,我们已经完成了类似from pandas import MultiIndex或 的操作from pandas import *