pandas 多索引数据帧的 lexsort_depth 究竟是什么?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/27116739/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:42:41  来源:igfitidea点击:

What exactly is the lexsort_depth of a multi-index Dataframe?

pythonnumpypandas

提问by Amelio Vazquez-Reina

What exactly is the lexsort_depthof a multi-index dataframe? Why does it have to be sorted for indexing?

lexsort_depth多索引数据框究竟是什么?为什么必须对索引进行排序?

For example, I have noticed that, after manually building a multi-index dataframe dfwith columns organized in three levels, if I try to do:

例如,我注意到,在手动构建一个多索引数据框后df,列组织成三个级别,如果我尝试这样做:

idx = pd.IndexSlice
df[idx['foo', 'bar']]

I get:

我得到:

KeyError: 'Key length (2) was greater than MultiIndex lexsort depth (0)'

and at this point, df.columns.lexsort_depthis 0

在这一点上,df.columns.lexsort_depth0

However,if I do, as recommended hereand here:

但是,如果我这样做,请按照此处此处的建议:

df = df.sortlevel(0,axis=1)

then the cross-section indexing works. Why? What exactly is lexsort_depth, and why sorting with sortlevelfixes this type of indexing?

然后横截面索引起作用。为什么?究竟是什么lexsort_depth,为什么用排序来sortlevel修复这种类型的索引?

采纳答案by Jeff

lexsort_depthis the number of levels of a multi-index that are sorted lexically. That is, in an a-b-c-1-2-3 order (normal sort order).

lexsort_depth是按词法排序的多索引的级别数。即,按 abc-1-2-3 顺序(正常排序顺序)。

So element indexing willwork if a multi-index is not sorted, but the lookups may be quite a bit slower (in 0.15.2, this will show a PerformanceWarningfor doing these kinds of lookups, see here

因此,如果未对多索引进行排序,则元素索引起作用,但查找可能会慢一些(在 0.15.2 中,这将显示PerformanceWarning用于执行此类查找,请参见此处

The reason that sorting in general a good idea is that pandas is able to use hash-based indexing to figure out where the location is in a particular level independently for the level. ; then you can use these indexers to find the final locations.

排序通常是一个好主意的原因是,pandas 能够使用基于哈希的索引来独立地找出特定级别中的位置。; 然后您可以使用这些索引器来查找最终位置。

Pandas takes advantage of np.searchsortedto find these locations when its sorted. If its not sorted, then you have to fallback to some different (slower) methods.

Pandas 会np.searchsorted在排序时利用这些位置。如果它没有排序,那么你必须回退到一些不同的(较慢的)方法。

hereis the code that does this.

是执行操作的代码。