Python Pandas：通过标签获取唯一的 MultiIndex 级别值

Question

提问by ojdo

Say you have this MultiIndex-ed DataFrame:

假设你有这个 MultiIndex-ed DataFrame：

df = pd.DataFrame({'co':['DE','DE','FR','FR'],
                   'tp':['Lake','Forest','Lake','Forest'],
                   'area':[10,20,30,40],
                   'count':[7,5,2,3]})
df = df.set_index(['co','tp'])

Which looks like this:

看起来像这样：

           area  count
co tp
DE Lake      10      7
   Forest    20      5
FR Lake      30      2
   Forest    40      3

I would like to retrieve the unique values per index level. This can be accomplished using

我想检索每个索引级别的唯一值。这可以使用

df.index.levels[0]  # returns ['DE', 'FR]
df.index.levels[1]  # returns ['Lake', 'Forest']

What I would reallylike to do, is to retrieve these lists by addressing the levels by their name, i.e. 'co'and 'tp'. The shortest two ways I could find looks like this:

我真正想做的是通过按名称寻址级别来检索这些列表，即'co'和'tp'。我能找到的最短的两种方法如下所示：

list(set(df.index.get_level_values('co')))  # returns ['DE', 'FR']
df.index.levels[df.index.names.index('co')]  # returns ['DE', 'FR']

But non of them are very elegant. Is there a shorter way?

但他们都不是很优雅。有没有更短的方法？

Answer 1

采纳答案by Pietro Battiston

Pandas 0.23.0 finally introduceda much cleaner solution to this problem: the levelargument to Index.unique():

Pandas 0.23.0 终于为这个问题引入了一个更简洁的解决方案：level参数Index.unique()：

In [3]: df.index.unique(level='co')
Out[3]: Index(['DE', 'FR'], dtype='object', name='co')

This is now the recommended solution. It is far more efficient because it avoids creating a complete representation of the level values in memory, and re-scanning it.

这是现在推荐的解决方案。它的效率要高得多，因为它避免了在内存中创建级别值的完整表示并重新扫描它。

Answer 2

回答by Happy001

I guess u want unique values in a certain level (and by level names) of a multiindex. I usually do the following, which is a bit long.

我猜你想要一个多索引的某个级别（和级别名称）中的唯一值。我通常做以下，有点长。

In [11]: df.index.get_level_values('co').unique()
Out[11]: array(['DE', 'FR'], dtype=object)

Answer 3

回答by LeoRochael

If you're going to do the level lookup repeatedly, you could create a map of your index level names to level unique values with:

如果您要重复执行级别查找，则可以创建索引级别名称的映射，以使用以下方法对唯一值进行级别排序：

df_level_value_map = {
    name: level 
    for name, level in zip(df.index.names, df.index.levels)
}
df_level_value_map['']

But this is not in any way more efficient (or shorter) than your original attempts if you're only going to do this lookup once.

但是，如果您只打算进行一次此查找，那么这并不比您最初的尝试更有效（或更短）。

I really wish there was a method on indexes that returned such a dictionary (or series?) with a name like:

我真的希望索引上有一种方法可以返回这样一个字典（或系列？），其名称如下：

df.index.get_level_map(levels={...})

Where the levels parameter can limit the map to a subset of the existing levels. I could do without the parameter if it could be a property like:

级别参数可以将地图限制为现有级别的子集。如果它可以是一个属性，我可以不用参数：

df.index.level_map

Answer 4

回答by Hanan Shteingart

An alternative approach is to find the number of levels by calling df.index.levels[level_index]where level_index can be inferred from df.index.names.index(level_name). In the above example level_name = 'co'.

另一种方法是通过调用df.index.levels[level_index]where level_index 可以从推断出级别数df.index.names.index(level_name)。在上面的例子中，level_name = 'co'。

The proposed answer by @Happy001 computes the unique which may be computationally intensive.

@Happy001 提出的答案计算了可能需要大量计算的唯一值。

Answer 5

回答by CyclicUniverse

If you already know the index names, is it not straightforward to simply do: df['co'].unique()?

如果您已经知道索引名称，那么简单地执行以下操作是不是很简单： df['co'].unique()？

Python Pandas：通过标签获取唯一的 MultiIndex 级别值

提问by ojdo

采纳答案by Pietro Battiston

回答by Happy001

回答by LeoRochael

回答by Hanan Shteingart

回答by CyclicUniverse

相关推荐

最近更新

标签

Python Pandas：通过标签获取唯一的 MultiIndex 级别值

提问by ojdo

采纳答案by Pietro Battiston

回答by Happy001

回答by LeoRochael

回答by Hanan Shteingart

回答by CyclicUniverse

相关推荐

是否有不包含任何库的可移植 Python 2.7 Windows 发行版？

如何在终端中执行一行 python 脚本？

Python 您的数据库没有 South 数据库模块“south.db.postgresql_psycopg2”

Python lxml 安装错误 ubuntu 14.04（内部编译器错误）

相关推荐

最近更新

标签