Pandas 数据框应用参考前一行来计算差异

Question

提问by beta

I have the following pandas dataframe containing 2 columns (simplified). The first column contains player namesand the second column contains dates(datetimeobjects):

我有以下包含 2 列（简化）的 Pandas 数据框。第一列包含玩家姓名，第二列包含日期（datetime对象）：

  player    date
  A         2010-01-01
  A         2010-01-09
  A         2010-01-11
  A         2010-01-15
  B         2010-02-01
  B         2010-02-10
  B         2010-02-21
  B         2010-02-23

I want to add a column diffwhich represents the time difference in days per player. The result should look like this:

我想添加一个列差异，表示每个玩家的天数时差。结果应如下所示：

  player    date            diff
  A         2010-01-01      0
  A         2010-01-09      8
  A         2010-01-11      2
  A         2010-01-15      4
  B         2010-02-01      0
  B         2010-02-10      9
  B         2010-02-21      11
  B         2010-02-23      2

The first row has 0for diff, because there is no earlier date. The second row shows 8, because the difference between 2010-01-01and 2010-01-09is eight days.

第一行有0差异，因为没有更早的日期。第二行显示8，因为之间的差异2010-01-01，并2010-01-09为八天。

The problem is not calculating the day-difference between two datetimeobjects. I am just not sure on how to add the new column. I know, that I have to make a groupbyfirst (df.groupby('player')) and then use apply(or maybe transform?). However, I am stuck, because for calculating the difference, I need to refer to the previous rowin the apply-function, and I don't know how to do that, if possible at all.

问题不是计算两个datetime物体之间的日差。我只是不确定如何添加新列。我知道，我必须先制作groupby( df.groupby('player')) 然后使用apply（或者可能transform？）。但是，我被卡住了，因为为了计算差异，我需要参考应用函数中的前一行，如果可能的话，我不知道该怎么做。

Thank you very much.

非常感谢。

UPDATE:After trying both proposed solutions below, I figured out that they did not work with my code. After much headache, I found out that my data had duplicate indices. So after I found out that I have duplicate indices, a simple df.reset_index()solved my issue and the proposed solutions worked. Since both solutions work, but I can only mark one as correct, I will choose the more concise/shorter solution. Thanks to both of you, though!

更新：在尝试了以下两种建议的解决方案后，我发现它们不适用于我的代码。经过一番头痛，我发现我的数据有重复的索引。所以在我发现我有重复的索引后，一个简单的方法df.reset_index()解决了我的问题，并且建议的解决方案奏效了。由于两种解决方案都有效，但我只能将其中一个标记为正确，因此我将选择更简洁/更短的解决方案。不过还是要感谢你们俩！

Answer 1

采纳答案by Alex Riley

You can simply write:

你可以简单地写：

df['difference'] = df.groupby('player')['date'].diff().fillna(0)

This gives the new timedelta column with the correct values:

这为新的 timedelta 列提供了正确的值：

  player       date  difference
0      A 2010-01-01      0 days
1      A 2010-01-09      8 days
2      A 2010-01-11      2 days
3      A 2010-01-15      4 days
4      B 2010-02-01      0 days
5      B 2010-02-10      9 days
6      B 2010-02-21     11 days
7      B 2010-02-23      2 days

(I've used the name "difference" instead of "diff" to distinguish the name from the method diff.)

（我使用名称“difference”而不是“diff”来区分名称和方法diff。）

Answer 2

回答by Nader Hisham

another way if you want to implement it manually is to do the following

如果您想手动实现它的另一种方法是执行以下操作

def date_diff(df):
    df['difference'] = df['date'] - df['date'].shift()
    df['difference'].fillna(0 ,inplace = True)
    return df

In [30]:
df_final = df.groupby(df['player']).apply(date_diff)
df_final
Out[30]:
player  date    difference
A   2010-01-01  0 days
A   2010-01-09  8 days
A   2010-01-11  2 days
A   2010-01-15  4 days
B   2010-02-01  0 days
B   2010-02-10  9 days
B   2010-02-21  11 days
B   2010-02-23  2 days

Answer 3

回答by MiaeKim

shift()is a good function, however, if you need to avoid data duplication, I would suggest following method.

shift()是一个很好的功能，但是，如果您需要避免数据重复，我建议使用以下方法。

def date_diff(row):
    index = df.index.get_loc(row.name)
    if index == 0:
        return np.nan
    prev_row = df.iloc[index - 1]
    return row['date'] - prev_row['date']

df['difference'] = df.apply(date_diff, axis=1)

Pandas 数据框应用参考前一行来计算差异

提问by beta

采纳答案by Alex Riley

回答by Nader Hisham

回答by MiaeKim

相关推荐

最近更新

标签

Pandas 数据框应用参考前一行来计算差异

提问by beta

采纳答案by Alex Riley

回答by Nader Hisham

回答by MiaeKim

相关推荐

pandas 当我没有表对象时，如何在 SQLAlchemy 中删除表？

在 pandas/matplotlib 中获取散点图的 Colorbar 实例

使用 NLTK 和 Pandas 去除停用词

pandas DataFrame 对象没有属性“样本”

相关推荐

最近更新

标签