检查非索引列是否在 Pandas 中排序

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/28419877/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:55:43  来源:igfitidea点击:

Check whether non-index column sorted in Pandas

pythonpandas

提问by nick_eu

Is there a way to test whether a dataframe is sorted by a given column that's not an index (i.e. is there an equivalent to is_monotonic() for non-index columns) without calling a sort all over again, and without converting a column into an index?

有没有一种方法可以测试数据帧是否按不是索引的给定列排序(即对于非索引列,是否有等效于 is_monotonic() 的方法)而无需再次调用排序,并且无需将列转换为指数?

回答by DSM

There are a handful of functions in pd.algoswhich might be of use. They're all undocumented implementation details, so they might change from release to release:

有一些函数pd.algos可能有用。它们都是未记录的实现细节,所以它们可能会随着版本的不同而变化:

>>> pd.algos.is[TAB]
pd.algos.is_lexsorted          pd.algos.is_monotonic_float64  pd.algos.is_monotonic_object
pd.algos.is_monotonic_bool     pd.algos.is_monotonic_int32
pd.algos.is_monotonic_float32  pd.algos.is_monotonic_int64    

The is_monotonic_*functions take an array of the specified dtype and a "timelike" boolean that should be Falsefor most use cases. (Pandas sets it to Truefor a case involving times represented as integers.) The return value is a tuple whose first element represents whether the array is monotonically non-decreasing, and whose second element represents whether the array is monotonically non-increasing. Other tuple elements are version-dependent:

这些is_monotonic_*函数采用指定 dtype 的数组和False适用于大多数用例的“timelike”布尔值。(True对于涉及以整数表示的时间的情况,Pandas 将其设置为。)返回值是一个元组,其第一个元素表示数组是否单调非递减,其第二个元素表示数组是否单调非递增。其他元组元素与版本相关:

>>> df = pd.DataFrame({"A": [1,2,2], "B": [2,3,1]})
>>> pd.algos.is_monotonic_int64(df.A.values, False)[0]
True
>>> pd.algos.is_monotonic_int64(df.B.values, False)[0]
False

All these functions assume a specific input dtype, even is_lexsorted, which assumes the input is a list of int64arrays. Pass it the wrong dtype, and it gets really confused:

所有这些函数都假定一个特定的输入数据类型 even is_lexsorted,它假定输入是一个int64数组列表。将错误的 dtype 传递给它,它会变得非常混乱:

In [32]: pandas.algos.is_lexsorted([np.array([-2, -1], dtype=np.int64)])
Out[32]: True
In [33]: pandas.algos.is_lexsorted([np.array([-2, -1], dtype=float)])
Out[33]: False
In [34]: pandas.algos.is_lexsorted([np.array([-1, -2, 0], dtype=float)])
Out[34]: True

I'm not entirely sure why Series don't already have some kind of short-circuiting is_sorted. There might be something which makes it trickier than it seems.

我不完全确定为什么 Series 还没有某种短路is_sorted。可能有一些事情使它比看起来更棘手。

回答by shx2

You can use the numpy method:

您可以使用 numpy 方法:

import numpy as np

def is_df_sorted(df, colname):
    return (np.diff(df[colname]) > 0).all()

A more direct approach (like you suggested, but you say you don't want it..) is to convert to an index and use the is_monotonicproperty:

更直接的方法(如您所建议的,但您说您不想要它......)是转换为索引并使用该is_monotonic属性:

import pandas as pd

def is_df_sorted(df, colname):
    return pd.Index(df[colname]).is_monotonic

回答by Konstantin

Meanwhile, since 0.19.0, there is pandas.Series.is_monotonic_increasing, pandas.Series.is_monotonic_decreasing, and pandas.Series.is_monotonic.

同时,从 0.19.0 开始,有pandas.Series.is_monotonic_increasingpandas.Series.is_monotonic_decreasing、 和pandas.Series.is_monotonic