Python 从 MultiIndex 中的索引列中获取唯一值

Question

提问by seth

I know that I can get the unique values of a DataFrameby resetting the index but is there a way to avoid this step and get the unique values directly?

我知道我可以DataFrame通过重置索引来获取 a 的唯一值，但是有没有办法避免这一步并直接获取唯一值？

Given I have:

鉴于我有：

        C
 A B     
 0 one  3
 1 one  2
 2 two  1

I can do:

我可以：

df = df.reset_index()
uniq_b = df.B.unique()
df = df.set_index(['A','B'])

Is there a way built in pandas to do this?

有没有一种内置于熊猫的方法来做到这一点？

Answer 1

采纳答案by Andy Hayden

One way is to use index.levels:

一种方法是使用index.levels：

In [11]: df
Out[11]: 
       C
A B     
0 one  3
1 one  2
2 two  1

In [12]: df.index.levels[1]
Out[12]: Index([one, two], dtype=object)

Answer 2

回答by 8one6

Andy Hayden's answer (index.levels[blah]) is great for some scenarios, but can lead to odd behavior in others. My understanding is that Pandas goes to great lengths to "reuse" indices when possible to avoid having the indices of lots of similarly-indexed DataFrames taking up space in memory. As a result, I've found the following annoying behavior:

安迪·海登 (Andy Hayden) 的回答 ( index.levels[blah]) 在某些情况下非常有用，但在其他情况下可能会导致奇怪的行为。我的理解是，Pandas 在可能的情况下会竭尽全力“重用”索引，以避免大量类似索引的 DataFrame 的索引占用内存空间。结果，我发现了以下令人讨厌的行为：

import pandas as pd
import numpy as np

np.random.seed(0)

idx = pd.MultiIndex.from_product([['John', 'Josh', 'Alex'], list('abcde')], 
                                 names=['Person', 'Letter'])
large = pd.DataFrame(data=np.random.randn(15, 2), 
                     index=idx, 
                     columns=['one', 'two'])
small = large.loc[['Jo'==d[0:2] for d in large.index.get_level_values('Person')]]

print small.index.levels[0]
print large.index.levels[0]

Which outputs

哪些输出

Index([u'Alex', u'John', u'Josh'], dtype='object')
Index([u'Alex', u'John', u'Josh'], dtype='object')

rather than the expected

而不是预期的

Index([u'John', u'Josh'], dtype='object')
Index([u'Alex', u'John', u'Josh'], dtype='object')

As one person pointed out on the other thread, one idiom that seems very natural and works properly would be:

正如一个人在另一条帖子中指出的那样，一个看起来非常自然且工作正常的习语是：

small.index.get_level_values('Person').unique()
large.index.get_level_values('Person').unique()

I hope this helps someone else dodge the super-unexpected behavior that I ran into.

我希望这可以帮助其他人避免我遇到的超级意外行为。

Python 从 MultiIndex 中的索引列中获取唯一值

提问by seth

采纳答案by Andy Hayden

回答by 8one6

相关推荐

最近更新

标签

Python 从 MultiIndex 中的索引列中获取唯一值

提问by seth

采纳答案by Andy Hayden

回答by 8one6

相关推荐

Python 根据包含 len(string) 给出 KeyError 的条件表达式从 Pandas DataFrame 中删除行

Python 我如何在类中定义装饰器方法？

Python 美汤有没有办法统计一个html页面中的标签数量

Python 请求 - 管理 cookie

相关推荐

最近更新

标签