Python Pandas:通过标签获取唯一的 MultiIndex 级别值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/24495695/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas: Get unique MultiIndex level values by label
提问by ojdo
Say you have this MultiIndex-ed DataFrame:
假设你有这个 MultiIndex-ed DataFrame:
df = pd.DataFrame({'co':['DE','DE','FR','FR'],
'tp':['Lake','Forest','Lake','Forest'],
'area':[10,20,30,40],
'count':[7,5,2,3]})
df = df.set_index(['co','tp'])
Which looks like this:
看起来像这样:
area count
co tp
DE Lake 10 7
Forest 20 5
FR Lake 30 2
Forest 40 3
I would like to retrieve the unique values per index level. This can be accomplished using
我想检索每个索引级别的唯一值。这可以使用
df.index.levels[0] # returns ['DE', 'FR]
df.index.levels[1] # returns ['Lake', 'Forest']
What I would reallylike to do, is to retrieve these lists by addressing the levels by their name, i.e. 'co'
and 'tp'
. The shortest two ways I could find looks like this:
我真正想做的是通过按名称寻址级别来检索这些列表,即'co'
和'tp'
。我能找到的最短的两种方法如下所示:
list(set(df.index.get_level_values('co'))) # returns ['DE', 'FR']
df.index.levels[df.index.names.index('co')] # returns ['DE', 'FR']
But non of them are very elegant. Is there a shorter way?
但他们都不是很优雅。有没有更短的方法?
采纳答案by Pietro Battiston
Pandas 0.23.0 finally introduceda much cleaner solution to this problem: the level
argument to Index.unique()
:
Pandas 0.23.0 终于为这个问题引入了一个更简洁的解决方案:level
参数Index.unique()
:
In [3]: df.index.unique(level='co')
Out[3]: Index(['DE', 'FR'], dtype='object', name='co')
This is now the recommended solution. It is far more efficient because it avoids creating a complete representation of the level values in memory, and re-scanning it.
这是现在推荐的解决方案。它的效率要高得多,因为它避免了在内存中创建级别值的完整表示并重新扫描它。
回答by Happy001
I guess u want unique values in a certain level (and by level names) of a multiindex. I usually do the following, which is a bit long.
我猜你想要一个多索引的某个级别(和级别名称)中的唯一值。我通常做以下,有点长。
In [11]: df.index.get_level_values('co').unique()
Out[11]: array(['DE', 'FR'], dtype=object)
回答by LeoRochael
If you're going to do the level lookup repeatedly, you could create a map of your index level names to level unique values with:
如果您要重复执行级别查找,则可以创建索引级别名称的映射,以使用以下方法对唯一值进行级别排序:
df_level_value_map = {
name: level
for name, level in zip(df.index.names, df.index.levels)
}
df_level_value_map['']
But this is not in any way more efficient (or shorter) than your original attempts if you're only going to do this lookup once.
但是,如果您只打算进行一次此查找,那么这并不比您最初的尝试更有效(或更短)。
I really wish there was a method on indexes that returned such a dictionary (or series?) with a name like:
我真的希望索引上有一种方法可以返回这样一个字典(或系列?),其名称如下:
df.index.get_level_map(levels={...})
Where the levels parameter can limit the map to a subset of the existing levels. I could do without the parameter if it could be a property like:
级别参数可以将地图限制为现有级别的子集。如果它可以是一个属性,我可以不用参数:
df.index.level_map
回答by Hanan Shteingart
An alternative approach is to find the number of levels by calling df.index.levels[level_index]
where level_index can be inferred from df.index.names.index(level_name)
. In the above example level_name = 'co'.
另一种方法是通过调用df.index.levels[level_index]
where level_index 可以从 推断出级别数df.index.names.index(level_name)
。在上面的例子中,level_name = 'co'。
The proposed answer by @Happy001 computes the unique which may be computationally intensive.
@Happy001 提出的答案计算了可能需要大量计算的唯一值。
回答by CyclicUniverse
If you already know the index names, is it not straightforward to simply do:
df['co'].unique()
?
如果您已经知道索引名称,那么简单地执行以下操作是不是很简单:
df['co'].unique()
?