Python Pandas 相当于 Oracle Lead/Lag 函数

Question

提问by gcarmiol

First I'm new to pandas, but I'm already falling in love with it. I'm trying to implement the equivalent of the Lag function from Oracle.

首先，我是熊猫的新手，但我已经爱上了它。我正在尝试从 Oracle 实现等效的 Lag 函数。

Let's suppose you have this DataFrame:

假设你有这个 DataFrame：

Date                   Group      Data
2014-05-14 09:10:00        A         1
2014-05-14 09:20:00        A         2
2014-05-14 09:30:00        A         3
2014-05-14 09:40:00        A         4
2014-05-14 09:50:00        A         5
2014-05-14 10:00:00        B         1
2014-05-14 10:10:00        B         2
2014-05-14 10:20:00        B         3
2014-05-14 10:30:00        B         4

If this was an oracle database and I wanted to create a lag function grouped by the "Group" column and ordered by the Date I could easily use this function:

如果这是一个 oracle 数据库，并且我想创建一个按“组”列分组并按日期排序的滞后函数，我可以轻松地使用此函数：

 LAG(Data,1,NULL) OVER (PARTITION BY Group ORDER BY Date ASC) AS Data_lagged

This would result in the following Table:

这将导致下表：

Date                   Group     Data    Data lagged
2014-05-14 09:10:00        A        1           Null
2014-05-14 09:20:00        A        2            1
2014-05-14 09:30:00        A        3            2
2014-05-14 09:40:00        A        4            3
2014-05-14 09:50:00        A        5            4
2014-05-14 10:00:00        B        1           Null
2014-05-14 10:10:00        B        2            1
2014-05-14 10:20:00        B        3            2
2014-05-14 10:30:00        B        4            3

In pandas I can set the date to be an index and use the shift method:

在熊猫中，我可以将日期设置为索引并使用 shift 方法：

db["Data_lagged"] = db.Data.shift(1)

The only issue is that this doesn't group by a column. Even if I set the two columns Date and Group as indexes, I would still get the "5" in the lagged column.

唯一的问题是这不按列分组。即使我将 Date 和 Group 两列设置为索引，我仍然会在滞后列中得到“5”。

Is there a way to implement the equivalent of the Lead and lag functions in Pandas?

有没有办法在 Pandas 中实现等效的 Lead 和 Lag 函数？

Answer 1

采纳答案by unutbu

You could perform a groupby/apply (shift) operation:

您可以执行groupby/apply (shift) 操作：

In [15]: df['Data_lagged'] = df.groupby(['Group'])['Data'].shift(1)

In [16]: df
Out[16]: 
                Date Group  Data  Data_lagged
2014-05-14  09:10:00     A     1          NaN
2014-05-14  09:20:00     A     2            1
2014-05-14  09:30:00     A     3            2
2014-05-14  09:40:00     A     4            3
2014-05-14  09:50:00     A     5            4
2014-05-14  10:00:00     B     1          NaN
2014-05-14  10:10:00     B     2            1
2014-05-14  10:20:00     B     3            2
2014-05-14  10:30:00     B     4            3

[9 rows x 4 columns]

To obtain the ORDER BY Date ASCeffect, you must sort the DataFrame first:

要获得ORDER BY Date ASC效果，必须先对DataFrame进行排序：

df['Data_lagged'] = (df.sort_values(by=['Date'], ascending=True)
                       .groupby(['Group'])['Data'].shift(1))

Python Pandas 相当于 Oracle Lead/Lag 函数

提问by gcarmiol

采纳答案by unutbu

相关推荐

最近更新

标签

Python Pandas 相当于 Oracle Lead/Lag 函数

提问by gcarmiol

采纳答案by unutbu

相关推荐

如何将字节列表（unicode）转换为 Python 字符串？

Python 在matplotlib中使用pyplot.plot时如何删除圆形标记的轮廓

Python 如何使用 Pillow 将图像粘贴到更大的图像上？

Python 在新的多索引级别下连接 Pandas 列

相关推荐

最近更新

标签