将月份添加到 Pandas 中的日期时间列

Question

提问by 0nir

I have a dataframe df with 2 columns as below -

我有一个包含 2 列的数据框 df，如下所示 -

               START_DATE             MONTHS
0              2015-03-21                240
1              2015-03-21                240
2              2015-03-21                240
3              2015-03-21                240
4              2015-03-21                240
5              2015-01-01                120
6              2017-01-01                240
7                     NaN                NaN
8                     NaN                NaN
9                     NaN                NaN

The datatypes of the 2 columns are objects.

2 列的数据类型是对象。

>>> df.dtypes
START_DATE    object
MONTHS        object
dtype: object

Now, I want to create a new column "Result" by adding df['START_DATE'] & df['MONTHS']. So, I have done the below -

现在，我想通过添加 df['START_DATE'] 和 df['MONTHS'] 来创建一个新列“结果”。所以，我做了以下 -

from dateutil.relativedelta import relativedelta  

df['START_DATE'] = pd.to_datetime(df['START_DATE'])
df['MONTHS'] = df['MONTHS'].astype(float)

df['offset'] = df['MONTHS'].apply(lambda x: relativedelta(months=x))

df['Result'] = df['START_DATE'] + df['offset']

Here, I get the below error -

在这里，我收到以下错误 -

TypeError: incompatible type [object] for a datetime/timedelta operation

Note: Wanted to convert df['Months'] to int but wouldn't work as the field had Nulls.

注意：想要将 df['Months'] 转换为 int 但由于该字段具有 Null 值而无法正常工作。

Can you please give me some directions.Thanks.

能给我指点吗谢谢

Answer 1

回答by Jeff

This is a vectorized way to do this, so should be quite performant. Note that it doesn't handle month crossings / endings (and doesn't deal well with DST changes. I believe that's why you get the times).

这是执行此操作的矢量化方式，因此应该非常高效。请注意，它不处理月份的交叉/结束（并且不能很好地处理 DST 更改。我相信这就是您获得时间的原因）。

In [32]: df['START_DATE'] + df['MONTHS'].values.astype("timedelta64[M]")
Out[32]: 
0   2035-03-20 20:24:00
1   2035-03-20 20:24:00
2   2035-03-20 20:24:00
3   2035-03-20 20:24:00
4   2035-03-20 20:24:00
5   2024-12-31 10:12:00
6   2036-12-31 20:24:00
7                   NaT
8                   NaT
9                   NaT
Name: START_DATE, dtype: datetime64[ns]

If you need exact MonthEnd/Begin handling, this is an appropriate method. (Use MonthsOffset to get the same day)

如果您需要精确的 MonthEnd/Begin 处理，这是一个合适的方法。（使用 MonthsOffset 获得同一天）

In [33]: df.dropna().apply(lambda x: x['START_DATE'] + pd.offsets.MonthEnd(x['MONTHS']), axis=1)
Out[33]: 
0   2035-02-28
1   2035-02-28
2   2035-02-28
3   2035-02-28
4   2035-02-28
5   2024-12-31
6   2036-12-31
dtype: datetime64[ns]

Answer 2

回答by Kathirmani Sukumar

Use the following if your dataframe is small. I have used axis=1, which is row wise operation. If your dataframe is large, it will be very slow

如果您的数据框很小，请使用以下内容。我使用过axis=1，这是行明智的操作。如果你的数据框很大，它会很慢

> df['offset'] = df.dropna().apply(lambda v: relativedelta(months=int(v['MONTHS'])) + v['START_DATE'], axis=1)
> df
  START_DATE  MONTHS     offset
0 2015-03-21     240 2035-03-21
1 2015-03-21     240 2035-03-21
2 2015-03-21     240 2035-03-21
3 2015-03-21     240 2035-03-21
4 2015-03-21     240 2035-03-21
5 2015-01-01     120 2025-01-01
6 2017-01-01     240 2037-01-01
7        NaT     NaN        NaT
8        NaT     NaN        NaT
9        NaT     NaN        NaT

Answer 3

回答by selwyth

Here's a way to do it without dateutil.relativedelta. Note that I convert MONTHSto an integer (and only after dropping the null values since intdoesn't accept null values) because I want to do integer division by 12 months per year, exploiting the fact that the quotient is the delta in years and the modulo/remainder is the delta in months.

这是一种无需dateutil.relativedelta. 请注意，我转换MONTHS为整数（并且仅在删除空值之后，因为int不接受空值）因为我想每年除以 12 个月的整数，利用商是年的增量和模的事实/remainder 是以月为单位的增量。

import pandas as pd

df = pd.DataFrame({'START_DATE':['2015-03-21','2015-03-21','2015-03-21','2015-03-21',
                                 '2015-03-21','2015-01-01','2017-01-01', None,None,None],
                   'MONTHS':[240,240,240,240,240,120,240,None,None,None]},
                  dtype='object') # replicate example data

df.dropna(inplace=True) # drop nulls so can convert MONTHS to int
df['START_DATE'] = pd.to_datetime(df['START_DATE'])
df['MONTHS'] = df.MONTHS.astype(int)

df.apply(lambda x: pd.datetime(x.START_DATE.year + x.MONTHS / 12,
                               x.START_DATE.month + x.MONTHS % 12,
                               x.START_DATE.day), axis=1)

Answer 4

回答by MaxU

Here is yet another vectorizednumpy solution:

这是另一个矢量化的numpy 解决方案：

In [111]: mask = (df.START_DATE.notnull() & df.MONTHS.notnull())

In [112]: df.loc[mask, 'Result'] = (
     ...:     df.START_DATE.loc[mask].values.astype('M8[M]') + \
     ...:     (df.MONTHS.loc[mask].values.astype(int) * np.timedelta64(1, 'M'))
     ...: ).astype('M8[D]') - np.timedelta64(1, 'D')
     ...:

In [113]: df
Out[113]:
  START_DATE  MONTHS     Result
0 2015-03-21   240.0 2035-02-28
1 2015-03-21   240.0 2035-02-28
2 2015-03-21   240.0 2035-02-28
3 2015-03-21   240.0 2035-02-28
4 2015-03-21   240.0 2035-02-28
5 2015-01-01   120.0 2024-12-31
6 2017-01-01   240.0 2036-12-31
7        NaT     NaN        NaT
8        NaT     NaN        NaT
9        NaT     NaN        NaT

Answer 5

回答by MANDAR PATIL

In response to Jeff,I think this doesn't work correctly for months which are not a multiple of 12. Like I had initial date as '2020-05-04 (yyyy-mm-dd) and months as 57. But addition gave 2025-02-01 (instead of 2025-02-04).

作为对 Jeff 的回应，我认为这在不是 12 倍数的月份中不能正常工作。就像我的初始日期为 2020-05-04 (yyyy-mm-dd) 和月份为 57。但加法给出了2025-02-01（而不是 2025-02-04）。

init_workbook['CALC_DATE']= init_workbook['STRTDATE']+init_workbook['MONTHS'].values.astype("timedelta64[M]")

>>> init_workbook.head(4)
   MONTHS    STRTDATE   CALC_DATE
0      12  2020-05-04  2021-05-04
1      12  2020-05-04  2021-05-04
2      57  2020-05-04  2025-02-01
3      34  2020-05-20  2023-03-20

Now again if the date is greater than 12 then it gives correct result but if date <12 that's where it fails

现在再次如果日期大于 12 那么它会给出正确的结果但是如果日期 <12 那就是它失败的地方

将月份添加到 Pandas 中的日期时间列

提问by 0nir

回答by Jeff

回答by Kathirmani Sukumar

回答by selwyth

回答by MaxU

回答by MANDAR PATIL

相关推荐

最近更新

标签

将月份添加到 Pandas 中的日期时间列

提问by 0nir

回答by Jeff

回答by Kathirmani Sukumar

回答by selwyth

回答by MaxU

回答by MANDAR PATIL

相关推荐

pandas 在 Python 中将数字转换为时间

pandas 在现有数据框中添加多行

pandas 读取 ASCII 格式的表格

Pandas：无法写入excel文件

相关推荐

最近更新

标签