pandas 如何在熊猫数据框中移动日期（添加 x 个月）？

Question

提问by Pythonista anonymous

I have a dataframe with columns of dates.

我有一个包含日期列的数据框。

I know how to shift dates by a fixed number of months (eg add 3 months to all the dates in column x); however, I cannot figure out how to shift dates by a number of months which is not fixed, but is another column of the dataframe.

我知道如何按固定的月数移动日期（例如，将 x 列中的所有日期添加 3 个月）；但是，我无法弄清楚如何将日期移动几个月，这不是固定的，而是数据框的另一列。

Any ideas?

有任何想法吗？

I have copied a minimal example below. The error I get is:

我在下面复制了一个最小的例子。我得到的错误是：

The truth value of a Series is ambiguous

Thanks a lot!

非常感谢！

import pandas as pd
import numpy as np
import datetime

df = pd.DataFrame()
df['year'] = np.arange(2000,2010)
df['month'] = 3

df['mydate'] = pd.to_datetime(  (df.year * 10000 + df.month * 100 +1).apply(str), format='%Y%m%d')
df['month shift'] = np.arange(0,10)

# if I want to shift mydate by 3 months, I can convert it to DatetimeIndex and use dateOffset:
df['my date shifted by 3 months'] = pd.DatetimeIndex( df['mydate'] ) + pd.DateOffset(months = 3)

# however, how do I shift mydate by the number of months in the column 'month shift'?
#This does NOT work:
df['my date shifted'] = pd.DatetimeIndex( df['mydate'] ) + pd.DateOffset(months = df['month shift'])
print df

Answer 1

采纳答案by Anton Protopopov

IIUC you could use applywith axis=1:

IIUC你可以使用apply同axis=1：

In [23]: df.apply(lambda x: x['mydate'] + pd.DateOffset(months = x['month shift']), axis=1)
Out[23]:
0   2000-03-01
1   2001-04-01
2   2002-05-01
3   2003-06-01
4   2004-07-01
5   2005-08-01
6   2006-09-01
7   2007-10-01
8   2008-11-01
9   2009-12-01
dtype: datetime64[ns]

Answer 2

回答by MikeGM

"one"-liner using the underlying numpy functionality:

“one”-liner 使用底层 numpy 功能：

df['my date shifted'] = (
    df["mydate"].values.astype("datetime64[M]") 
    + df["month shift"].values.astype("timedelta64[M]")
)

Answer 3

回答by godfryd

EdChurn's solution is indeed much faster than the answer of Anton Protopopov and in fact in my use case it executes in milliseconds versus the one with apply taking minutes. The problem is that the solution EdChurn posted in his comment gives slightly incorrect results. In the example:

EdChurn 的解决方案确实比 Anton Protopopov 的答案快得多，事实上，在我的用例中，它以毫秒为单位执行，而应用则需要几分钟。问题是 EdChurn 在他的评论中发布的解决方案给出了稍微不正确的结果。在示例中：

import pandas as pd
import numpy as np
import datetime

df = pd.DataFrame()
df['year'] = np.arange(2000,2010)
df['month'] = 3

df['mydate'] = pd.to_datetime((df.year * 10000 + df.month * 100 + 1).apply(str), format='%Y%m%d')
df['month shift'] = np.arange(0,10)

The answer of:

答案：

df['my date shifted'] = df['mydate'] + pd.TimedeltaIndex( df['month shift'], unit='M')

gives:

给出：

The correct solution can be obtained with:

可以通过以下方式获得正确的解决方案：

def set_to_month_begin(series):
    #Following doesn't work:
    #  res = series.dt.floor("MS")

    #This also doesn't work (it fails in case the date is already the first day of the month):
    #  res = series - pd.offsets.MonthBegin(1)

    res = pd.to_datetime(series).dt.normalize()
    res = res - pd.to_timedelta(res.dt.day - 1, unit='d')
    return res

def add_months(df, date_col, months_num_col):
    """This function adds the number of months specified per each row in `months_num_col` to date in `date_col`.
    This method is *significantly* faster than:
        df.apply(lambda x: x[date_col] + pd.DateOffset(months = x[months_num_col]), axis=1)
    """
    number_of_days_in_avg_month = 365.24 / 12
    time_delta = pd.TimedeltaIndex(df[months_num_col] * number_of_days_in_avg_month + 10, unit='D')
    return set_to_month_begin(df[date_col] + time_delta)

df['my date shifted'] = add_months(df, 'mydate', 'month shift')

This gives the following result:

这给出了以下结果：

pandas 如何在熊猫数据框中移动日期（添加 x 个月）？

提问by Pythonista anonymous

采纳答案by Anton Protopopov

回答by MikeGM

回答by godfryd

相关推荐

最近更新

标签

pandas 如何在熊猫数据框中移动日期（添加 x 个月）？

提问by Pythonista anonymous

采纳答案by Anton Protopopov

回答by MikeGM

回答by godfryd

相关推荐

pandas 如何根据值列表选择熊猫中的行

pandas 熊猫将数据框绘制为多个条形图

pandas 在 jupyter ipython notebook 上导入熊猫失败

pandas ValueError：无法从重复的轴重新索引

相关推荐

最近更新

标签