Python Pandas 相当于 Oracle Lead/Lag 函数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/23664877/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas equivalent of Oracle Lead/Lag function
提问by gcarmiol
First I'm new to pandas, but I'm already falling in love with it. I'm trying to implement the equivalent of the Lag function from Oracle.
首先,我是熊猫的新手,但我已经爱上了它。我正在尝试从 Oracle 实现等效的 Lag 函数。
Let's suppose you have this DataFrame:
假设你有这个 DataFrame:
Date Group Data
2014-05-14 09:10:00 A 1
2014-05-14 09:20:00 A 2
2014-05-14 09:30:00 A 3
2014-05-14 09:40:00 A 4
2014-05-14 09:50:00 A 5
2014-05-14 10:00:00 B 1
2014-05-14 10:10:00 B 2
2014-05-14 10:20:00 B 3
2014-05-14 10:30:00 B 4
If this was an oracle database and I wanted to create a lag function grouped by the "Group" column and ordered by the Date I could easily use this function:
如果这是一个 oracle 数据库,并且我想创建一个按“组”列分组并按日期排序的滞后函数,我可以轻松地使用此函数:
LAG(Data,1,NULL) OVER (PARTITION BY Group ORDER BY Date ASC) AS Data_lagged
This would result in the following Table:
这将导致下表:
Date Group Data Data lagged
2014-05-14 09:10:00 A 1 Null
2014-05-14 09:20:00 A 2 1
2014-05-14 09:30:00 A 3 2
2014-05-14 09:40:00 A 4 3
2014-05-14 09:50:00 A 5 4
2014-05-14 10:00:00 B 1 Null
2014-05-14 10:10:00 B 2 1
2014-05-14 10:20:00 B 3 2
2014-05-14 10:30:00 B 4 3
In pandas I can set the date to be an index and use the shift method:
在熊猫中,我可以将日期设置为索引并使用 shift 方法:
db["Data_lagged"] = db.Data.shift(1)
The only issue is that this doesn't group by a column. Even if I set the two columns Date and Group as indexes, I would still get the "5" in the lagged column.
唯一的问题是这不按列分组。即使我将 Date 和 Group 两列设置为索引,我仍然会在滞后列中得到“5”。
Is there a way to implement the equivalent of the Lead and lag functions in Pandas?
有没有办法在 Pandas 中实现等效的 Lead 和 Lag 函数?
采纳答案by unutbu
You could perform a groupby/apply (shift) operation:
您可以执行groupby/apply (shift) 操作:
In [15]: df['Data_lagged'] = df.groupby(['Group'])['Data'].shift(1)
In [16]: df
Out[16]:
Date Group Data Data_lagged
2014-05-14 09:10:00 A 1 NaN
2014-05-14 09:20:00 A 2 1
2014-05-14 09:30:00 A 3 2
2014-05-14 09:40:00 A 4 3
2014-05-14 09:50:00 A 5 4
2014-05-14 10:00:00 B 1 NaN
2014-05-14 10:10:00 B 2 1
2014-05-14 10:20:00 B 3 2
2014-05-14 10:30:00 B 4 3
[9 rows x 4 columns]
To obtain the ORDER BY Date ASC
effect, you must sort the DataFrame first:
要获得ORDER BY Date ASC
效果,必须先对DataFrame进行排序:
df['Data_lagged'] = (df.sort_values(by=['Date'], ascending=True)
.groupby(['Group'])['Data'].shift(1))