pandas 熊猫系列的分位数函数的倒数是多少?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/26489134/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:35:48  来源:igfitidea点击:

what's the inverse of the quantile function on a pandas Series?

pythonpandasquantile

提问by Mannaggia

The quantile functions gives us the quantile of a given pandas series s,

分位数函数为我们提供给定Pandas系列s的分位数,

E.g.

例如

s.quantile(0.9) is 4.2

s.quantile(0.9) 是 4.2

Is there the inverse function (i.e. cumulative distribution) which finds the value x such that

是否有反函数(即累积分布)找到值 x 使得

s.quantile(x)=4

s.quantile(x)=4

Thanks

谢谢

回答by fernandosjp

I had the same question as you did! I found an easy way of getting the inverse of quantile using scipy.

我和你有同样的问题!我找到了一种使用 scipy 获取分位数倒数的简单方法。

#libs required
from scipy import stats
import pandas as pd
import numpy as np

#generate ramdom data with same seed (to be reproducible)
np.random.seed(seed=1)
df = pd.DataFrame(np.random.uniform(0,1,(10)), columns=['a'])

#quantile function
x = df.quantile(0.5)[0]

#inverse of quantile
stats.percentileofscore(df['a'],x)

回答by ILoveCoding

Sorting can be expensive, if you look for a single value I'd guess you'd be better of computing it with:

排序可能很昂贵,如果您寻找单个值,我猜您最好使用以下方法计算它:

s = pd.Series(np.random.uniform(size=1000))
( s < 0.7 ).astype(int).mean() # =0.7ish

There's probably a way to avoid the int(bool) shenanigan.

可能有一种方法可以避免 int(bool) 恶作剧。

回答by Mike

There's no 1-liner that I know of, but you can achieve this with scipy:

我所知道的没有 1-liner,但是您可以使用 scipy 实现这一点:

import pandas as pd
import numpy as np
from scipy.interpolate import interp1d

# set up a sample dataframe
df = pd.DataFrame(np.random.uniform(0,1,(11)), columns=['a'])
# sort it by the desired series and caculate the percentile
sdf = df.sort('a').reset_index()
sdf['b'] = sdf.index / float(len(sdf) - 1)
# setup the interpolator using the value as the index
interp = interp1d(sdf['a'], sdf['b'])

# a is the value, b is the percentile
>>> sdf
    index         a    b
0      10  0.030469  0.0
1       3  0.144445  0.1
2       4  0.304763  0.2
3       1  0.359589  0.3
4       7  0.385524  0.4
5       5  0.538959  0.5
6       8  0.642845  0.6
7       6  0.667710  0.7
8       9  0.733504  0.8
9       2  0.905646  0.9
10      0  0.961936  1.0

Now we can see that the two functions are inverses of each other.

现在我们可以看到这两个函数是互逆的。

>>> df['a'].quantile(0.57)
0.61167933268395969
>>> interp(0.61167933268395969)
array(0.57)
>>> interp(df['a'].quantile(0.43))
array(0.43)

interp can also take in list, a numpy array, or a pandas data series, any iterator really!

interp 还可以接收列表、numpy 数组或 Pandas 数据系列,真的是任何迭代器!

回答by Calvin Ku

Just came across the same problem. Here's my two cents.

刚刚遇到同样的问题。这是我的两分钱。

def inverse_percentile(arr, num):
    arr = sorted(arr)
    i_arr = [i for i, x in enumerate(arr) if x > num]

    return i_arr[0] / len(arr) if len(i_arr) > 0 else 1

回答by u3397819

Mathematically speaking, you're trying to find the CDFor return the probability of sbeing smaller than or equal to a value or quantile of q:

从数学上讲,您试图找到CDF或返回s小于或等于 的值或分位数的概率q

F(q) = Pr[s <= q]

One can use numpy and try this one-line code:

可以使用 numpy 并尝试以下单行代码:

np.mean(s.to_numpy() <= q)