pandas 根据前几年的数据计算熊猫数据框行的百分位数

Question

提问by user308827

I have the following dataframe:

我有以下数据框：

datetime       JD      YEAR    VAL 
2000-01-01      1      2000    0.5
2000-01-02      2      2000    1.2
2000-01-03      3      2000    2.1
2000-01-04      4      2000    3.4 
2000-01-05      5      2000    4.6
2000-01-06      6      2000    6.8
2000-01-07      7      2000    7.2
2000-01-08      8      2000    0.2
2000-01-09      9      2000    0.9
...
2010-12-31      365    2014    4.1

The first year is 2000 and last year is 2010. There are no leap years (i.e. no row corresponding to Feb 29th), datetime is the index column.

第一年是2000年，去年是2010年。没有闰年（即没有对应2月29日的行），datetime是索引列。

I would like to compute a new dataframe, stretching from Jan 1st 2010 to Dec 31st 2010. I would like it to contains a column which computes the percentile of Jan 1st 2010 value (VAL) in the array composed of 10 values (Jan 1st 2000, Jan 1st 2001...Jan 1st 2009). Similarly, Jan 2nd 2010 is compared against Jan 2nd from previous years....

我想计算一个新的数据框，从 2010 年 1 月 1 日延伸到 2010 年 12 月 31 日。我希望它包含一个列，该列计算由 10 个值（2000 年 1 月 1 日）组成的数组中 2010 年 1 月 1 日值（VAL）的百分位数，2001 年 1 月 1 日...2009 年 1 月 1 日）。同样，将 2010 年 1 月 2 日与前几年的 1 月 2 日进行比较......

lyr = df.YEAR.max() # last year i.e. 2010
cdf = df[df.YEAR == lyr]# Latest year dataframe
pdf = df[df.index.year < lyr] # Previous years dataframe

pdf.groupby('JD')['VAL']
stats.percentileofscore(pdf['VAL'], cdf['VAL'])

However, I am not sure how to get the code to work. The groupby only returns group whereas I need a list of values.

但是，我不确定如何使代码正常工作。groupby 只返回 group 而我需要一个值列表。

Answer 1

回答by JohnE

Set up with a small sample dataframe:

设置一个小样本数据框：

np.random.seed(1234)
df = pd.DataFrame({ 'jd':  np.tile([1,2],3),
                    'yr':  np.repeat([2008,2009,2010],2),
                    'val': np.random.randn(6) })

Then it's just one line:

那么它只是一行：

df['pctile'] = df.groupby('jd')['val'].rank(pct=True)

Here's the output, sorted with sort_values(['jd','val'])

这是输出，排序为 sort_values(['jd','val'])

   jd       val    yr    pctile
4   1 -0.720589  2010  0.333333
0   1  0.471435  2008  0.666667
2   1  1.432707  2009  1.000000
1   2 -1.190976  2008  0.333333
3   2 -0.312652  2009  0.666667
5   2  0.887163  2010  1.000000

pandas 根据前几年的数据计算熊猫数据框行的百分位数

提问by user308827

回答by JohnE

相关推荐

最近更新

标签

pandas 根据前几年的数据计算熊猫数据框行的百分位数

提问by user308827

回答by JohnE

相关推荐

在 Python Pandas 中删除 NaN 并转换为 float32

Python Pandas 合并导致内存溢出

pandas 在熊猫数据框中的每一行中查找非零值的列索引集

获取 CParserError。pandas 是否对单元格中值的最大大小进行了限制？

相关推荐

最近更新

标签