将月份添加到 Pandas 中的日期时间列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/30328427/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:22:17  来源:igfitidea点击:

Add months to a datetime column in pandas

pythonpython-2.7python-3.xpandasipython

提问by 0nir

I have a dataframe df with 2 columns as below -

我有一个包含 2 列的数据框 df,如下所示 -

               START_DATE             MONTHS
0              2015-03-21                240
1              2015-03-21                240
2              2015-03-21                240
3              2015-03-21                240
4              2015-03-21                240
5              2015-01-01                120
6              2017-01-01                240
7                     NaN                NaN
8                     NaN                NaN
9                     NaN                NaN

The datatypes of the 2 columns are objects.

2 列的数据类型是对象。

>>> df.dtypes
START_DATE    object
MONTHS        object
dtype: object

Now, I want to create a new column "Result" by adding df['START_DATE'] & df['MONTHS']. So, I have done the below -

现在,我想通过添加 df['START_DATE'] 和 df['MONTHS'] 来创建一个新列“结果”。所以,我做了以下 -

from dateutil.relativedelta import relativedelta  

df['START_DATE'] = pd.to_datetime(df['START_DATE'])
df['MONTHS'] = df['MONTHS'].astype(float)

df['offset'] = df['MONTHS'].apply(lambda x: relativedelta(months=x))

df['Result'] = df['START_DATE'] + df['offset'] 

Here, I get the below error -

在这里,我收到以下错误 -

TypeError: incompatible type [object] for a datetime/timedelta operation

Note: Wanted to convert df['Months'] to int but wouldn't work as the field had Nulls.

注意:想要将 df['Months'] 转换为 int 但由于该字段具有 Null 值而无法正常工作。

Can you please give me some directions.Thanks.

能给我指点吗 谢谢

回答by Jeff

This is a vectorized way to do this, so should be quite performant. Note that it doesn't handle month crossings / endings (and doesn't deal well with DST changes. I believe that's why you get the times).

这是执行此操作的矢量化方式,因此应该非常高效。请注意,它不处理月份的交叉/结束(并且不能很好地处理 DST 更改。我相信这就是您获得时间的原因)。

In [32]: df['START_DATE'] + df['MONTHS'].values.astype("timedelta64[M]")
Out[32]: 
0   2035-03-20 20:24:00
1   2035-03-20 20:24:00
2   2035-03-20 20:24:00
3   2035-03-20 20:24:00
4   2035-03-20 20:24:00
5   2024-12-31 10:12:00
6   2036-12-31 20:24:00
7                   NaT
8                   NaT
9                   NaT
Name: START_DATE, dtype: datetime64[ns]

If you need exact MonthEnd/Begin handling, this is an appropriate method. (Use MonthsOffset to get the same day)

如果您需要精确的 MonthEnd/Begin 处理,这是一个合适的方法。(使用 MonthsOffset 获得同一天)

In [33]: df.dropna().apply(lambda x: x['START_DATE'] + pd.offsets.MonthEnd(x['MONTHS']), axis=1)
Out[33]: 
0   2035-02-28
1   2035-02-28
2   2035-02-28
3   2035-02-28
4   2035-02-28
5   2024-12-31
6   2036-12-31
dtype: datetime64[ns]

回答by Kathirmani Sukumar

Use the following if your dataframe is small. I have used axis=1, which is row wise operation. If your dataframe is large, it will be very slow

如果您的数据框很小,请使用以下内容。我使用过axis=1,这是行明智的操作。如果你的数据框很大,它会很慢

> df['offset'] = df.dropna().apply(lambda v: relativedelta(months=int(v['MONTHS'])) + v['START_DATE'], axis=1)
> df
  START_DATE  MONTHS     offset
0 2015-03-21     240 2035-03-21
1 2015-03-21     240 2035-03-21
2 2015-03-21     240 2035-03-21
3 2015-03-21     240 2035-03-21
4 2015-03-21     240 2035-03-21
5 2015-01-01     120 2025-01-01
6 2017-01-01     240 2037-01-01
7        NaT     NaN        NaT
8        NaT     NaN        NaT
9        NaT     NaN        NaT

回答by selwyth

Here's a way to do it without dateutil.relativedelta. Note that I convert MONTHSto an integer (and only after dropping the null values since intdoesn't accept null values) because I want to do integer division by 12 months per year, exploiting the fact that the quotient is the delta in years and the modulo/remainder is the delta in months.

这是一种无需dateutil.relativedelta. 请注意,我转换MONTHS为整数(并且仅在删除空值之后,因为int不接受空值)因为我想每年除以 12 个月的整数,利用商是年的增量和模的事实/remainder 是以月为单位的增量。

import pandas as pd

df = pd.DataFrame({'START_DATE':['2015-03-21','2015-03-21','2015-03-21','2015-03-21',
                                 '2015-03-21','2015-01-01','2017-01-01', None,None,None],
                   'MONTHS':[240,240,240,240,240,120,240,None,None,None]},
                  dtype='object') # replicate example data

df.dropna(inplace=True) # drop nulls so can convert MONTHS to int
df['START_DATE'] = pd.to_datetime(df['START_DATE'])
df['MONTHS'] = df.MONTHS.astype(int)

df.apply(lambda x: pd.datetime(x.START_DATE.year + x.MONTHS / 12,
                               x.START_DATE.month + x.MONTHS % 12,
                               x.START_DATE.day), axis=1)

回答by MaxU

Here is yet another vectorizednumpy solution:

这是另一个矢量化的numpy 解决方案:

In [111]: mask = (df.START_DATE.notnull() & df.MONTHS.notnull())

In [112]: df.loc[mask, 'Result'] = (
     ...:     df.START_DATE.loc[mask].values.astype('M8[M]') + \
     ...:     (df.MONTHS.loc[mask].values.astype(int) * np.timedelta64(1, 'M'))
     ...: ).astype('M8[D]') - np.timedelta64(1, 'D')
     ...:

In [113]: df
Out[113]:
  START_DATE  MONTHS     Result
0 2015-03-21   240.0 2035-02-28
1 2015-03-21   240.0 2035-02-28
2 2015-03-21   240.0 2035-02-28
3 2015-03-21   240.0 2035-02-28
4 2015-03-21   240.0 2035-02-28
5 2015-01-01   120.0 2024-12-31
6 2017-01-01   240.0 2036-12-31
7        NaT     NaN        NaT
8        NaT     NaN        NaT
9        NaT     NaN        NaT

回答by MANDAR PATIL

In response to Jeff,I think this doesn't work correctly for months which are not a multiple of 12. Like I had initial date as '2020-05-04 (yyyy-mm-dd) and months as 57. But addition gave 2025-02-01 (instead of 2025-02-04).

作为对 Jeff 的回应,我认为这在不是 12 倍数的月份中不能正常工作。就像我的初始日期为 2020-05-04 (yyyy-mm-dd) 和月份为 57。但加法给出了2025-02-01(而不是 2025-02-04)。

init_workbook['CALC_DATE']= init_workbook['STRTDATE']+init_workbook['MONTHS'].values.astype("timedelta64[M]")

>>> init_workbook.head(4)
   MONTHS    STRTDATE   CALC_DATE
0      12  2020-05-04  2021-05-04
1      12  2020-05-04  2021-05-04
2      57  2020-05-04  2025-02-01
3      34  2020-05-20  2023-03-20

Now again if the date is greater than 12 then it gives correct result but if date <12 that's where it fails

现在再次如果日期大于 12 那么它会给出正确的结果但是如果日期 <12 那就是它失败的地方