Python Pandas:通过标签获取唯一的 MultiIndex 级别值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/24495695/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 04:41:17  来源:igfitidea点击:

Pandas: Get unique MultiIndex level values by label

pythonpandas

提问by ojdo

Say you have this MultiIndex-ed DataFrame:

假设你有这个 MultiIndex-ed DataFrame:

df = pd.DataFrame({'co':['DE','DE','FR','FR'],
                   'tp':['Lake','Forest','Lake','Forest'],
                   'area':[10,20,30,40],
                   'count':[7,5,2,3]})
df = df.set_index(['co','tp'])

Which looks like this:

看起来像这样:

           area  count
co tp
DE Lake      10      7
   Forest    20      5
FR Lake      30      2
   Forest    40      3

I would like to retrieve the unique values per index level. This can be accomplished using

我想检索每个索引级别的唯一值。这可以使用

df.index.levels[0]  # returns ['DE', 'FR]
df.index.levels[1]  # returns ['Lake', 'Forest']

What I would reallylike to do, is to retrieve these lists by addressing the levels by their name, i.e. 'co'and 'tp'. The shortest two ways I could find looks like this:

真正想做的是通过按名称寻址级别来检索这些列表,即'co''tp'。我能找到的最短的两种方法如下所示:

list(set(df.index.get_level_values('co')))  # returns ['DE', 'FR']
df.index.levels[df.index.names.index('co')]  # returns ['DE', 'FR']

But non of them are very elegant. Is there a shorter way?

但他们都不是很优雅。有没有更短的方法?

采纳答案by Pietro Battiston

Pandas 0.23.0 finally introduceda much cleaner solution to this problem: the levelargument to Index.unique():

Pandas 0.23.0 终于为这个问题引入了一个更简洁的解决方案:level参数Index.unique()

In [3]: df.index.unique(level='co')
Out[3]: Index(['DE', 'FR'], dtype='object', name='co')

This is now the recommended solution. It is far more efficient because it avoids creating a complete representation of the level values in memory, and re-scanning it.

这是现在推荐的解决方案。它的效率要高得多,因为它避免了在内存中创建级别值的完整表示并重新扫描它。

回答by Happy001

I guess u want unique values in a certain level (and by level names) of a multiindex. I usually do the following, which is a bit long.

我猜你想要一个多索引的某个级别(和级别名称)中的唯一值。我通常做以下,有点长。

In [11]: df.index.get_level_values('co').unique()
Out[11]: array(['DE', 'FR'], dtype=object)

回答by LeoRochael

If you're going to do the level lookup repeatedly, you could create a map of your index level names to level unique values with:

如果您要重复执行级别查找,则可以创建索引级别名称的映射,以使用以下方法对唯一值进行级别排序:

df_level_value_map = {
    name: level 
    for name, level in zip(df.index.names, df.index.levels)
}
df_level_value_map['']

But this is not in any way more efficient (or shorter) than your original attempts if you're only going to do this lookup once.

但是,如果您只打算进行一次此查找,那么这并不比您最初的尝试更有效(或更短)。

I really wish there was a method on indexes that returned such a dictionary (or series?) with a name like:

我真的希望索引上有一种方法可以返回这样一个字典(或系列?),其名称如下:

df.index.get_level_map(levels={...})

Where the levels parameter can limit the map to a subset of the existing levels. I could do without the parameter if it could be a property like:

级别参数可以将地图限制为现有级别的子集。如果它可以是一个属性,我可以不用参数:

df.index.level_map

回答by Hanan Shteingart

An alternative approach is to find the number of levels by calling df.index.levels[level_index]where level_index can be inferred from df.index.names.index(level_name). In the above example level_name = 'co'.

另一种方法是通过调用df.index.levels[level_index]where level_index 可以从 推断出级别数df.index.names.index(level_name)。在上面的例子中,level_name = 'co'。

The proposed answer by @Happy001 computes the unique which may be computationally intensive.

@Happy001 提出的答案计算了可能需要大量计算的唯一值。

回答by CyclicUniverse

If you already know the index names, is it not straightforward to simply do: df['co'].unique()?

如果您已经知道索引名称,那么简单地执行以下操作是不是很简单: df['co'].unique()