在 pandas DataFrame 的滚动窗口上对数据进行排名
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/14440187/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
rank data over a rolling window in pandas DataFrame
提问by FrankDR
I am new to Python and the Pandas library, so apologies if this is a trivial question. I am trying to rank a Timeseries over a rolling window of N days. I know there is a rank function but this function ranks the data over the entire timeseries. I don't seem to be able to find a rolling rank function. Here is an example of what I am trying to do:
我是 Python 和 Pandas 库的新手,如果这是一个微不足道的问题,我深表歉意。我试图在 N 天的滚动窗口上对时间序列进行排名。我知道有一个排名函数,但这个函数对整个时间序列的数据进行排名。我似乎无法找到滚动排名函数。这是我正在尝试做的一个例子:
A
01-01-2013 100
02-01-2013 85
03-01-2013 110
04-01-2013 60
05-01-2013 20
06-01-2013 40
If I wanted to rank the data over a rolling window of 3 days, the answer should be:
如果我想在 3 天的滚动窗口内对数据进行排名,答案应该是:
Ranked_A
01-01-2013 NaN
02-01-2013 Nan
03-01-2013 1
04-01-2013 3
05-01-2013 3
06-01-2013 2
Is there a built-in function in Python that can do this? Any suggestion? Many thanks.
Python 中是否有可以执行此操作的内置函数?有什么建议吗?非常感谢。
采纳答案by metakermit
If you want to use the Pandas built-in rank method(with some additional semantics, such as the ascending option), you can create a simple function wrapper for it
如果你想使用 Pandas内置的 rank 方法(带有一些额外的语义,比如升序选项),你可以为它创建一个简单的函数包装器
def rank(array):
s = pd.Series(array)
return s.rank(ascending=False)[len(s)-1]
that can then be used as a custom rolling-window function.
然后可以将其用作自定义滚动窗口函数。
pd.rolling_apply(df['A'], 3, rank)
which outputs
哪个输出
Date
01-01-2013 NaN
02-01-2013 NaN
03-01-2013 1
04-01-2013 3
05-01-2013 3
06-01-2013 2
(I'm assuming the dfdata structure from Rutger's answer)
(我假设df数据结构来自 Rutger 的回答)
回答by Rutger Kassies
You can write a custom function for a rolling_window in Pandas. Using numpy's argsort() in that function can give you the rank within the window:
您可以为 Pandas 中的滚动窗口编写自定义函数。在该函数中使用 numpy 的 argsort() 可以为您提供窗口内的排名:
import pandas as pd
import StringIO
testdata = StringIO.StringIO("""
Date,A
01-01-2013,100
02-01-2013,85
03-01-2013,110
04-01-2013,60
05-01-2013,20
06-01-2013,40""")
df = pd.read_csv(testdata, header=True, index_col=['Date'])
rollrank = lambda data: data.size - data.argsort().argsort()[-1]
df['rank'] = pd.rolling_apply(df, 3, rollrank)
print df
results in:
结果是:
A rank
Date
01-01-2013 100 NaN
02-01-2013 85 NaN
03-01-2013 110 1
04-01-2013 60 3
05-01-2013 20 3
06-01-2013 40 2

