pandas MultiIndex Slicing 要求索引完全lexsorted
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/39876416/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
MultiIndex Slicing requires the index to be fully lexsorted
提问by FooBar
I have a data frame with index (year
, foo
), where I would like to select the X largest observations of foo
where year == someYear
.
我有一个带有索引 ( year
, foo
)的数据框,我想在其中选择foo
where的 X 个最大观测值year == someYear
。
My approach was
我的方法是
df.sort_index(level=[0, 1], ascending=[1, 0], inplace=True)
df.loc[pd.IndexSlice[2002, :10], :]
but I get
但我明白了
KeyError: 'MultiIndex Slicing requires the index to be fully lexsorted tuple len (2), lexsort depth (0)'
I tried different variants of sorting (e.g. ascending = [0, 0]
), but they all resulted in some sort of error.
我尝试了不同的排序变体(例如ascending = [0, 0]
),但它们都导致了某种错误。
If I only wanted the xth
row, I could df.groupby(level=[0]).nth(x)
after sorting, but since I want a set of rows, that doesn't feel quite efficient.
如果我只想要xth
行,我可以df.groupby(level=[0]).nth(x)
在排序后,但由于我想要一组行,所以感觉效率不高。
What's the best way to select these rows? Some data to play with:
选择这些行的最佳方法是什么?一些数据可以玩:
rank_int rank
year foo
2015 1.381845 2 320
1.234795 2 259
1.148488 199 2
0.866704 2 363
0.738022 2 319
回答by Danila Savenkov
Firstly you should do sorting like this:
首先,您应该像这样进行排序:
df.sort_index(level=['year','foo'], ascending=[1, 0], inplace=True)
It should fix the KeyError. But df.loc[pd.IndexSlice[2002, :10], :]
won't give you the result you are expecting. The loc function is not iloc and it will try to find in foo indexes 0,1..9. The secondary levels of Multiindex do not support iloc, I would suggest using groupby. If you already have this multiindex you should do:
它应该修复 KeyError。但df.loc[pd.IndexSlice[2002, :10], :]
不会给你你期望的结果。loc 函数不是 iloc,它会尝试在 foo 索引 0,1..9 中查找。Multiindex 的二级不支持 iloc,我建议使用 groupby。如果你已经有了这个多索引,你应该这样做:
df.reset_index()
df = df.sort_values(by=['year','foo'],ascending=[True,False])
df.groupby('year').head(10)
If you need n entries with the least foo you can use tail(n)
. If you need, say, the first, third and fifth entries, you can use nth([0,2,4])
as you mentioned in the question.
I think it's the most efficient way one could do it.
如果您需要最少 foo 的 n 个条目,则可以使用tail(n)
. 如果您需要,例如,第一,第三和第五个条目,您可以使用nth([0,2,4])
您在问题中提到的。我认为这是最有效的方法。
回答by ASGM
ascending
should be a boolean, not a list. Try sorting this way:
ascending
应该是一个布尔值,而不是一个列表。尝试这样排序:
df.sort_index(ascending=True, inplace=True)
df.sort_index(ascending=True, inplace=True)
回答by FooBar
To get the xth
observations of the second level as wanted, one can combine loc
with iloc
:
要根据需要获得xth
第二级的观察结果,可以结合loc
使用iloc
:
df.sort_index(level=[0, 1], ascending=[1, 0], inplace=True)
df.loc[2015].iloc[:10]
works as expected. This does not answer the weird index locking w.r.t. lexsorting, however.
按预期工作。然而,这并不能回答与词法排序有关的奇怪的索引锁定问题。
回答by tsando
For me it worked by using sort_index(axis=1)
:
对我来说,它通过使用sort_index(axis=1)
:
df = df.sort_index(axis=1)
Once you do this, you can use slice
or pandas.IndexSlice
, e.g.:
完成此操作后,您可以使用slice
或pandas.IndexSlice
,例如:
df.loc[:, idx[:, 'A']]