pandas 如何在熊猫数据框中移动日期(添加 x 个月)?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/35411925/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to shift dates in a pandas dataframe (add x months)?
提问by Pythonista anonymous
I have a dataframe with columns of dates.
我有一个包含日期列的数据框。
I know how to shift dates by a fixed number of months (eg add 3 months to all the dates in column x); however, I cannot figure out how to shift dates by a number of months which is not fixed, but is another column of the dataframe.
我知道如何按固定的月数移动日期(例如,将 x 列中的所有日期添加 3 个月);但是,我无法弄清楚如何将日期移动几个月,这不是固定的,而是数据框的另一列。
Any ideas?
有任何想法吗?
I have copied a minimal example below. The error I get is:
我在下面复制了一个最小的例子。我得到的错误是:
The truth value of a Series is ambiguous
Thanks a lot!
非常感谢!
import pandas as pd
import numpy as np
import datetime
df = pd.DataFrame()
df['year'] = np.arange(2000,2010)
df['month'] = 3
df['mydate'] = pd.to_datetime( (df.year * 10000 + df.month * 100 +1).apply(str), format='%Y%m%d')
df['month shift'] = np.arange(0,10)
# if I want to shift mydate by 3 months, I can convert it to DatetimeIndex and use dateOffset:
df['my date shifted by 3 months'] = pd.DatetimeIndex( df['mydate'] ) + pd.DateOffset(months = 3)
# however, how do I shift mydate by the number of months in the column 'month shift'?
#This does NOT work:
df['my date shifted'] = pd.DatetimeIndex( df['mydate'] ) + pd.DateOffset(months = df['month shift'])
print df
采纳答案by Anton Protopopov
IIUC you could use apply
with axis=1
:
IIUC你可以使用apply
同axis=1
:
In [23]: df.apply(lambda x: x['mydate'] + pd.DateOffset(months = x['month shift']), axis=1)
Out[23]:
0 2000-03-01
1 2001-04-01
2 2002-05-01
3 2003-06-01
4 2004-07-01
5 2005-08-01
6 2006-09-01
7 2007-10-01
8 2008-11-01
9 2009-12-01
dtype: datetime64[ns]
回答by MikeGM
"one"-liner using the underlying numpy functionality:
“one”-liner 使用底层 numpy 功能:
df['my date shifted'] = (
df["mydate"].values.astype("datetime64[M]")
+ df["month shift"].values.astype("timedelta64[M]")
)
回答by godfryd
EdChurn's solution is indeed much faster than the answer of Anton Protopopov and in fact in my use case it executes in milliseconds versus the one with apply taking minutes. The problem is that the solution EdChurn posted in his comment gives slightly incorrect results. In the example:
EdChurn 的解决方案确实比 Anton Protopopov 的答案快得多,事实上,在我的用例中,它以毫秒为单位执行,而应用则需要几分钟。问题是 EdChurn 在他的评论中发布的解决方案给出了稍微不正确的结果。在示例中:
import pandas as pd
import numpy as np
import datetime
df = pd.DataFrame()
df['year'] = np.arange(2000,2010)
df['month'] = 3
df['mydate'] = pd.to_datetime((df.year * 10000 + df.month * 100 + 1).apply(str), format='%Y%m%d')
df['month shift'] = np.arange(0,10)
The answer of:
答案:
df['my date shifted'] = df['mydate'] + pd.TimedeltaIndex( df['month shift'], unit='M')
The correct solution can be obtained with:
可以通过以下方式获得正确的解决方案:
def set_to_month_begin(series):
#Following doesn't work:
# res = series.dt.floor("MS")
#This also doesn't work (it fails in case the date is already the first day of the month):
# res = series - pd.offsets.MonthBegin(1)
res = pd.to_datetime(series).dt.normalize()
res = res - pd.to_timedelta(res.dt.day - 1, unit='d')
return res
def add_months(df, date_col, months_num_col):
"""This function adds the number of months specified per each row in `months_num_col` to date in `date_col`.
This method is *significantly* faster than:
df.apply(lambda x: x[date_col] + pd.DateOffset(months = x[months_num_col]), axis=1)
"""
number_of_days_in_avg_month = 365.24 / 12
time_delta = pd.TimedeltaIndex(df[months_num_col] * number_of_days_in_avg_month + 10, unit='D')
return set_to_month_begin(df[date_col] + time_delta)
df['my date shifted'] = add_months(df, 'mydate', 'month shift')