pandas 多列的熊猫 get_level_values

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/39080555/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:52:18  来源:igfitidea点击:

pandas get_level_values for multiple columns

pythonpython-3.xpandasdataframemulti-index

提问by danielhadar

Is there a way to get the result of get_level_valuesfor more than one column?

有没有办法获得get_level_values不止一列的结果?

Given the following DataFrame:

鉴于以下情况DataFrame

         d
a b c     
1 4 10  16
    11  17
  5 12  18
2 5 13  19
  6 14  20
3 7 15  21

I wish to get the values (i.e.list of tuples) of levels aand c:

我希望获得级别的值(元组列表)ac

[(1, 10), (1, 11), (1, 12), (2, 13), (2, 14), (3, 15)]

Notes:

笔记:

  • It is impossible to give get_level_valuesmore than one level (e.g.df.index.get_level_values(['a','c'])

  • There's a workaround in which one could use get_level_valuesover each desired column and zipthem together:

  • 不可能给出get_level_values多个级别(例如df.index.get_level_values(['a','c']

  • 有一种解决方法,可以将get_level_values每个所需的列和zip它们一起使用:

For example:

例如:

a_list = df.index.get_level_values('a').values
c_list = df.index.get_level_values('c').values

print([i for i in zip(a_list,c_list)])
[(1, 10), (1, 11), (1, 12), (2, 13), (2, 14), (3, 15)]

but it get cumbersome as the number of columns grow.

但随着列数的增加,它变得很麻烦。

  • The code to build the example DataFrame:
  • 构建示例的代码DataFrame

df = pd.DataFrame({'a':[1,1,1,2,2,3],'b':[4,4,5,5,6,7,],'c':[10,11,12,13,14,15], 'd':[16,17,18,19,20,21]}).set_index(['a','b','c'])

df = pd.DataFrame({'a':[1,1,1,2,2,3],'b':[4,4,5,5,6,7,],'c':[10,11,12,13,14,15], 'd':[16,17,18,19,20,21]}).set_index(['a','b','c'])

采纳答案by Alberto Garcia-Raboso

The .tolist()method of a MultiIndexgives a list of tuples for all the levels in the MultiIndex. For example, with your example DataFrame,

a 的.tolist()方法MultiIndex给出了 .a 文件中所有级别的元组列表MultiIndex。例如,用你的例子DataFrame

df.index.tolist()
# => [(1, 4, 10), (1, 4, 11), (1, 5, 12), (2, 5, 13), (2, 6, 14), (3, 7, 15)]

So here are two ideas:

所以这里有两个想法:

  1. Get the list of tuples from the original MultiIndexand filter the result.

    [(a, c) for a, b, c in df.index.tolist()]
    # => [(1, 10), (1, 11), (1, 12), (2, 13), (2, 14), (3, 15)]
    

    The disadvantage of this simple method is that you have you manually specify the order of the levels you want. You can leverage itertools.compressto select them by name instead.

    from itertools import compress
    
    mask = [1 if name in ['a', 'c'] else 0 for name in df.index.names]
    [tuple(compress(t, mask)) for t in df.index.tolist()]
    # => [(1, 10), (1, 11), (1, 12), (2, 13), (2, 14), (3, 15)]
    
  2. Create a MultiIndex that has exactly the levels you want and call .tolist()on it.

    df.index.droplevel('b').tolist()
    # => [(1, 10), (1, 11), (1, 12), (2, 13), (2, 14), (3, 15)]
    

    If you would prefer to name the levels you want to keep — instead of those that you want to drop — you could do something like

    df.index.droplevel([level for level in df.index.names
                    if not level in ['a', 'c']]).tolist()
    # => [(1, 10), (1, 11), (1, 12), (2, 13), (2, 14), (3, 15)]
    
  1. 从原始获取元组列表MultiIndex并过滤结果。

    [(a, c) for a, b, c in df.index.tolist()]
    # => [(1, 10), (1, 11), (1, 12), (2, 13), (2, 14), (3, 15)]
    

    这种简单方法的缺点是您必须手动指定所需级别的顺序。您可以itertools.compress改为按名称选择它们。

    from itertools import compress
    
    mask = [1 if name in ['a', 'c'] else 0 for name in df.index.names]
    [tuple(compress(t, mask)) for t in df.index.tolist()]
    # => [(1, 10), (1, 11), (1, 12), (2, 13), (2, 14), (3, 15)]
    
  2. 创建一个完全具有您想要的级别的 MultiIndex 并调用.tolist()它。

    df.index.droplevel('b').tolist()
    # => [(1, 10), (1, 11), (1, 12), (2, 13), (2, 14), (3, 15)]
    

    如果您更愿意命名您想要保留的级别 - 而不是您想要删除的级别 - 您可以执行以下操作

    df.index.droplevel([level for level in df.index.names
                    if not level in ['a', 'c']]).tolist()
    # => [(1, 10), (1, 11), (1, 12), (2, 13), (2, 14), (3, 15)]
    

回答by IanS

This is less cumbersome insofar as you can pass the list of index names you want to select:

这不那么麻烦,因为您可以传递要选择的索引名称列表:

df.reset_index()[['a', 'c']].to_dict(orient='split')['data']

I have not found a way of selecting levels 'a'and 'b'from the index object directly, hence the use of reset_index.

我还没有找到一种方法选择水平'a''b'索引对象直接,因此,使用的reset_index

Note that to_dictreturns a list of lists and not tuples:

请注意,to_dict返回列表而不是元组列表:

[[1, 10], [1, 11], [1, 12], [2, 13], [2, 14], [3, 15]]