pandas 多列的熊猫 get_level_values
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/39080555/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas get_level_values for multiple columns
提问by danielhadar
Is there a way to get the result of get_level_values
for more than one column?
有没有办法获得get_level_values
不止一列的结果?
Given the following DataFrame
:
鉴于以下情况DataFrame
:
d
a b c
1 4 10 16
11 17
5 12 18
2 5 13 19
6 14 20
3 7 15 21
I wish to get the values (i.e.list of tuples) of levels a
and c
:
我希望获得级别的值(即元组列表)a
和c
:
[(1, 10), (1, 11), (1, 12), (2, 13), (2, 14), (3, 15)]
Notes:
笔记:
It is impossible to give
get_level_values
more than one level (e.g.df.index.get_level_values(['a','c']
)There's a workaround in which one could use
get_level_values
over each desired column andzip
them together:
不可能给出
get_level_values
多个级别(例如df.index.get_level_values(['a','c']
)有一种解决方法,可以将
get_level_values
每个所需的列和zip
它们一起使用:
For example:
例如:
a_list = df.index.get_level_values('a').values
c_list = df.index.get_level_values('c').values
print([i for i in zip(a_list,c_list)])
[(1, 10), (1, 11), (1, 12), (2, 13), (2, 14), (3, 15)]
but it get cumbersome as the number of columns grow.
但随着列数的增加,它变得很麻烦。
- The code to build the example
DataFrame
:
- 构建示例的代码
DataFrame
:
df = pd.DataFrame({'a':[1,1,1,2,2,3],'b':[4,4,5,5,6,7,],'c':[10,11,12,13,14,15], 'd':[16,17,18,19,20,21]}).set_index(['a','b','c'])
df = pd.DataFrame({'a':[1,1,1,2,2,3],'b':[4,4,5,5,6,7,],'c':[10,11,12,13,14,15], 'd':[16,17,18,19,20,21]}).set_index(['a','b','c'])
采纳答案by Alberto Garcia-Raboso
The .tolist()
method of a MultiIndex
gives a list of tuples for all the levels in the MultiIndex
. For example, with your example DataFrame
,
a 的.tolist()
方法MultiIndex
给出了 .a 文件中所有级别的元组列表MultiIndex
。例如,用你的例子DataFrame
,
df.index.tolist()
# => [(1, 4, 10), (1, 4, 11), (1, 5, 12), (2, 5, 13), (2, 6, 14), (3, 7, 15)]
So here are two ideas:
所以这里有两个想法:
Get the list of tuples from the original
MultiIndex
and filter the result.[(a, c) for a, b, c in df.index.tolist()] # => [(1, 10), (1, 11), (1, 12), (2, 13), (2, 14), (3, 15)]
The disadvantage of this simple method is that you have you manually specify the order of the levels you want. You can leverage
itertools.compress
to select them by name instead.from itertools import compress mask = [1 if name in ['a', 'c'] else 0 for name in df.index.names] [tuple(compress(t, mask)) for t in df.index.tolist()] # => [(1, 10), (1, 11), (1, 12), (2, 13), (2, 14), (3, 15)]
Create a MultiIndex that has exactly the levels you want and call
.tolist()
on it.df.index.droplevel('b').tolist() # => [(1, 10), (1, 11), (1, 12), (2, 13), (2, 14), (3, 15)]
If you would prefer to name the levels you want to keep — instead of those that you want to drop — you could do something like
df.index.droplevel([level for level in df.index.names if not level in ['a', 'c']]).tolist() # => [(1, 10), (1, 11), (1, 12), (2, 13), (2, 14), (3, 15)]
从原始获取元组列表
MultiIndex
并过滤结果。[(a, c) for a, b, c in df.index.tolist()] # => [(1, 10), (1, 11), (1, 12), (2, 13), (2, 14), (3, 15)]
这种简单方法的缺点是您必须手动指定所需级别的顺序。您可以
itertools.compress
改为按名称选择它们。from itertools import compress mask = [1 if name in ['a', 'c'] else 0 for name in df.index.names] [tuple(compress(t, mask)) for t in df.index.tolist()] # => [(1, 10), (1, 11), (1, 12), (2, 13), (2, 14), (3, 15)]
创建一个完全具有您想要的级别的 MultiIndex 并调用
.tolist()
它。df.index.droplevel('b').tolist() # => [(1, 10), (1, 11), (1, 12), (2, 13), (2, 14), (3, 15)]
如果您更愿意命名您想要保留的级别 - 而不是您想要删除的级别 - 您可以执行以下操作
df.index.droplevel([level for level in df.index.names if not level in ['a', 'c']]).tolist() # => [(1, 10), (1, 11), (1, 12), (2, 13), (2, 14), (3, 15)]
回答by IanS
This is less cumbersome insofar as you can pass the list of index names you want to select:
这不那么麻烦,因为您可以传递要选择的索引名称列表:
df.reset_index()[['a', 'c']].to_dict(orient='split')['data']
I have not found a way of selecting levels 'a'
and 'b'
from the index object directly, hence the use of reset_index
.
我还没有找到一种方法选择水平'a'
和'b'
索引对象直接,因此,使用的reset_index
。
Note that to_dict
returns a list of lists and not tuples:
请注意,to_dict
返回列表而不是元组列表:
[[1, 10], [1, 11], [1, 12], [2, 13], [2, 14], [3, 15]]