pandas MultiIndex Slicing 要求索引完全lexsorted

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/39876416/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:08:41  来源:igfitidea点击:

MultiIndex Slicing requires the index to be fully lexsorted

pythonpandas

提问by FooBar

I have a data frame with index (year, foo), where I would like to select the X largest observations of foowhere year == someYear.

我有一个带有索引 ( year, foo)的数据框,我想在其中选择foowhere的 X 个最大观测值year == someYear

My approach was

我的方法是

df.sort_index(level=[0, 1], ascending=[1, 0], inplace=True)
df.loc[pd.IndexSlice[2002, :10], :]

but I get

但我明白了

KeyError: 'MultiIndex Slicing requires the index to be fully lexsorted tuple len (2), lexsort depth (0)'

I tried different variants of sorting (e.g. ascending = [0, 0]), but they all resulted in some sort of error.

我尝试了不同的排序变体(例如ascending = [0, 0]),但它们都导致了某种错误。

If I only wanted the xthrow, I could df.groupby(level=[0]).nth(x)after sorting, but since I want a set of rows, that doesn't feel quite efficient.

如果我只想要xth行,我可以df.groupby(level=[0]).nth(x)在排序后,但由于我想要一组行,所以感觉效率不高。

What's the best way to select these rows? Some data to play with:

选择这些行的最佳方法是什么?一些数据可以玩:

                   rank_int  rank
year foo                         
2015 1.381845             2   320
     1.234795             2   259
     1.148488           199     2
     0.866704             2   363
     0.738022             2   319

回答by Danila Savenkov

Firstly you should do sorting like this:

首先,您应该像这样进行排序:

df.sort_index(level=['year','foo'], ascending=[1, 0], inplace=True)

It should fix the KeyError. But df.loc[pd.IndexSlice[2002, :10], :]won't give you the result you are expecting. The loc function is not iloc and it will try to find in foo indexes 0,1..9. The secondary levels of Multiindex do not support iloc, I would suggest using groupby. If you already have this multiindex you should do:

它应该修复 KeyError。但df.loc[pd.IndexSlice[2002, :10], :]不会给你你期望的结果。loc 函数不是 iloc,它会尝试在 foo 索引 0,1..9 中查找。Multiindex 的二级不支持 iloc,我建议使用 groupby。如果你已经有了这个多索引,你应该这样做:

df.reset_index()
df = df.sort_values(by=['year','foo'],ascending=[True,False])
df.groupby('year').head(10)

If you need n entries with the least foo you can use tail(n). If you need, say, the first, third and fifth entries, you can use nth([0,2,4])as you mentioned in the question. I think it's the most efficient way one could do it.

如果您需要最少 foo 的 n 个条目,则可以使用tail(n). 如果您需要,例如,第一,第三和第五个条目,您可以使用nth([0,2,4])您在问题中提到的。我认为这是最有效的方法。

回答by ASGM

ascendingshould be a boolean, not a list. Try sorting this way:

ascending应该是一个布尔值,而不是一个列表。尝试这样排序:

df.sort_index(ascending=True, inplace=True)

df.sort_index(ascending=True, inplace=True)

回答by FooBar

To get the xthobservations of the second level as wanted, one can combine locwith iloc:

要根据需要获得xth第二级的观察结果,可以结合loc使用iloc

df.sort_index(level=[0, 1], ascending=[1, 0], inplace=True)
df.loc[2015].iloc[:10]

works as expected. This does not answer the weird index locking w.r.t. lexsorting, however.

按预期工作。然而,这并不能回答与词法排序有关的奇怪的索引锁定问题。

回答by tsando

For me it worked by using sort_index(axis=1):

对我来说,它通过使用sort_index(axis=1)

df = df.sort_index(axis=1)

Once you do this, you can use sliceor pandas.IndexSlice, e.g.:

完成此操作后,您可以使用slicepandas.IndexSlice,例如:

df.loc[:, idx[:, 'A']]