pandas 熊猫系列的分位数函数的倒数是多少？

Question

提问by Mannaggia

The quantile functions gives us the quantile of a given pandas series s,

分位数函数为我们提供给定Pandas系列s的分位数，

E.g.

例如

s.quantile(0.9) is 4.2

s.quantile(0.9) 是 4.2

Is there the inverse function (i.e. cumulative distribution) which finds the value x such that

是否有反函数（即累积分布）找到值 x 使得

s.quantile(x)=4

Thanks

谢谢

Answer 1

回答by fernandosjp

I had the same question as you did! I found an easy way of getting the inverse of quantile using scipy.

我和你有同样的问题！我找到了一种使用 scipy 获取分位数倒数的简单方法。

#libs required
from scipy import stats
import pandas as pd
import numpy as np

#generate ramdom data with same seed (to be reproducible)
np.random.seed(seed=1)
df = pd.DataFrame(np.random.uniform(0,1,(10)), columns=['a'])

#quantile function
x = df.quantile(0.5)[0]

#inverse of quantile
stats.percentileofscore(df['a'],x)

Answer 2

回答by ILoveCoding

Sorting can be expensive, if you look for a single value I'd guess you'd be better of computing it with:

排序可能很昂贵，如果您寻找单个值，我猜您最好使用以下方法计算它：

s = pd.Series(np.random.uniform(size=1000))
( s < 0.7 ).astype(int).mean() # =0.7ish

There's probably a way to avoid the int(bool) shenanigan.

可能有一种方法可以避免 int(bool) 恶作剧。

Answer 3

回答by Mike

There's no 1-liner that I know of, but you can achieve this with scipy:

我所知道的没有 1-liner，但是您可以使用 scipy 实现这一点：

import pandas as pd
import numpy as np
from scipy.interpolate import interp1d

# set up a sample dataframe
df = pd.DataFrame(np.random.uniform(0,1,(11)), columns=['a'])
# sort it by the desired series and caculate the percentile
sdf = df.sort('a').reset_index()
sdf['b'] = sdf.index / float(len(sdf) - 1)
# setup the interpolator using the value as the index
interp = interp1d(sdf['a'], sdf['b'])

# a is the value, b is the percentile
>>> sdf
    index         a    b
0      10  0.030469  0.0
1       3  0.144445  0.1
2       4  0.304763  0.2
3       1  0.359589  0.3
4       7  0.385524  0.4
5       5  0.538959  0.5
6       8  0.642845  0.6
7       6  0.667710  0.7
8       9  0.733504  0.8
9       2  0.905646  0.9
10      0  0.961936  1.0

Now we can see that the two functions are inverses of each other.

现在我们可以看到这两个函数是互逆的。

>>> df['a'].quantile(0.57)
0.61167933268395969
>>> interp(0.61167933268395969)
array(0.57)
>>> interp(df['a'].quantile(0.43))
array(0.43)

interp can also take in list, a numpy array, or a pandas data series, any iterator really!

interp 还可以接收列表、numpy 数组或 Pandas 数据系列，真的是任何迭代器！

Answer 4

回答by Calvin Ku

Just came across the same problem. Here's my two cents.

刚刚遇到同样的问题。这是我的两分钱。

def inverse_percentile(arr, num):
    arr = sorted(arr)
    i_arr = [i for i, x in enumerate(arr) if x > num]

    return i_arr[0] / len(arr) if len(i_arr) > 0 else 1

Answer 5

回答by u3397819

Mathematically speaking, you're trying to find the CDFor return the probability of sbeing smaller than or equal to a value or quantile of q:

从数学上讲，您试图找到CDF或返回s小于或等于的值或分位数的概率q：

F(q) = Pr[s <= q]

One can use numpy and try this one-line code:

可以使用 numpy 并尝试以下单行代码：

np.mean(s.to_numpy() <= q)

pandas 熊猫系列的分位数函数的倒数是多少？

提问by Mannaggia

回答by fernandosjp

回答by ILoveCoding

回答by Mike

回答by Calvin Ku

回答by u3397819

相关推荐

最近更新

标签

pandas 熊猫系列的分位数函数的倒数是多少？

提问by Mannaggia

回答by fernandosjp

回答by ILoveCoding

回答by Mike

回答by Calvin Ku

回答by u3397819

相关推荐

按列表顺序对 Pandas DataFrame 进行排序

Pandas Multiindex：我做错了什么？

pandas IPython Notebook 抛出 ImportError – IPython 不会

将新列添加到 Pandas DataFrame 时的 NaN 值

相关推荐

最近更新

标签