pandas MultiIndex Slicing 要求索引完全lexsorted

Question

提问by FooBar

I have a data frame with index (year, foo), where I would like to select the X largest observations of foowhere year == someYear.

我有一个带有索引 ( year, foo)的数据框，我想在其中选择foowhere的 X 个最大观测值year == someYear。

My approach was

我的方法是

df.sort_index(level=[0, 1], ascending=[1, 0], inplace=True)
df.loc[pd.IndexSlice[2002, :10], :]

but I get

但我明白了

KeyError: 'MultiIndex Slicing requires the index to be fully lexsorted tuple len (2), lexsort depth (0)'

I tried different variants of sorting (e.g. ascending = [0, 0]), but they all resulted in some sort of error.

我尝试了不同的排序变体（例如ascending = [0, 0]），但它们都导致了某种错误。

If I only wanted the xthrow, I could df.groupby(level=[0]).nth(x)after sorting, but since I want a set of rows, that doesn't feel quite efficient.

如果我只想要xth行，我可以df.groupby(level=[0]).nth(x)在排序后，但由于我想要一组行，所以感觉效率不高。

What's the best way to select these rows? Some data to play with:

选择这些行的最佳方法是什么？一些数据可以玩：

                   rank_int  rank
year foo                         
2015 1.381845             2   320
     1.234795             2   259
     1.148488           199     2
     0.866704             2   363
     0.738022             2   319

Answer 1

回答by Danila Savenkov

Firstly you should do sorting like this:

首先，您应该像这样进行排序：

df.sort_index(level=['year','foo'], ascending=[1, 0], inplace=True)

It should fix the KeyError. But df.loc[pd.IndexSlice[2002, :10], :]won't give you the result you are expecting. The loc function is not iloc and it will try to find in foo indexes 0,1..9. The secondary levels of Multiindex do not support iloc, I would suggest using groupby. If you already have this multiindex you should do:

它应该修复 KeyError。但df.loc[pd.IndexSlice[2002, :10], :]不会给你你期望的结果。loc 函数不是 iloc，它会尝试在 foo 索引 0,1..9 中查找。Multiindex 的二级不支持 iloc，我建议使用 groupby。如果你已经有了这个多索引，你应该这样做：

df.reset_index()
df = df.sort_values(by=['year','foo'],ascending=[True,False])
df.groupby('year').head(10)

If you need n entries with the least foo you can use tail(n). If you need, say, the first, third and fifth entries, you can use nth([0,2,4])as you mentioned in the question. I think it's the most efficient way one could do it.

如果您需要最少 foo 的 n 个条目，则可以使用tail(n). 如果您需要，例如，第一，第三和第五个条目，您可以使用nth([0,2,4])您在问题中提到的。我认为这是最有效的方法。

Answer 2

回答by ASGM

ascendingshould be a boolean, not a list. Try sorting this way:

ascending应该是一个布尔值，而不是一个列表。尝试这样排序：

df.sort_index(ascending=True, inplace=True)

Answer 3

回答by FooBar

To get the xthobservations of the second level as wanted, one can combine locwith iloc:

要根据需要获得xth第二级的观察结果，可以结合loc使用iloc：

df.sort_index(level=[0, 1], ascending=[1, 0], inplace=True)
df.loc[2015].iloc[:10]

works as expected. This does not answer the weird index locking w.r.t. lexsorting, however.

按预期工作。然而，这并不能回答与词法排序有关的奇怪的索引锁定问题。

Answer 4

回答by tsando

For me it worked by using sort_index(axis=1):

对我来说，它通过使用sort_index(axis=1)：

df = df.sort_index(axis=1)

Once you do this, you can use sliceor pandas.IndexSlice, e.g.:

完成此操作后，您可以使用slice或pandas.IndexSlice，例如：

df.loc[:, idx[:, 'A']]

pandas MultiIndex Slicing 要求索引完全lexsorted

提问by FooBar

回答by Danila Savenkov

回答by ASGM

回答by FooBar

回答by tsando

相关推荐

最近更新

标签

pandas MultiIndex Slicing 要求索引完全lexsorted

提问by FooBar

回答by Danila Savenkov

回答by ASGM

回答by FooBar

回答by tsando

相关推荐

pandas 从熊猫数据帧计算 RSI

Pandas - rank() 函数的替代方法，为列提供唯一的序数等级

pandas 类型错误：不能将序列乘以“float”类型的非整数（python 2.7）

pandas pd.read_html() 导入列表而不是数据框

相关推荐

最近更新

标签