Pandas - 两个日期之间的月数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/42822768/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-08 15:47:16  来源:igfitidea点击:

Pandas - Number of Months Between Two Dates

datepandasdiff

提问by shavar

I think this should be simple but what I've seen are techniques that involve iterating over a dataframe date fields to determine the diff between two dates. And I'm having trouble with it. I'm familiar with MSSQL DATEDIFF so I thought Pandas datetime would have something similar. I perhaps it does but I'm missing it.

我认为这应该很简单,但我所看到的是涉及迭代数据框日期字段以确定两个日期之间差异的技术。我遇到了麻烦。我熟悉 MSSQL DATEDIFF,所以我认为 Pandas datetime 会有类似的东西。我也许确实如此,但我很想念它。

Is there a Pandonic way of determing the number of months as an integer between two dates (datetime) without the need to iterate? Keeping in mind that there potentially are millions of rows so performance is a consideration.

是否有一种 Pandonic 方法可以将月数确定为两个日期(日期时间)之间的整数而无需迭代?请记住,可能有数百万行,因此性能是一个考虑因素。

The dates are datetime objects and the result would like this - new column being Month:

日期是 datetime 对象,结果是这样的 - 新列是 Month:

Date1           Date2         Months
2016-04-07      2017-02-01    11
2017-02-01      2017-03-05    1

回答by ??????

Here is a very simple answer my friend:

这是我朋友的一个非常简单的答案:

df['nb_months'] = ((df.date2 - df.date1)/np.timedelta64(1, 'M'))

and now:

现在:

df['nb_months'] = df['nb_months'].astype(int)

回答by piRSquared

df.assign(
    Months=
    (df.Date2.dt.year - df.Date1.dt.year) * 12 +
    (df.Date2.dt.month - df.Date1.dt.month)
)

       Date1      Date2  Months
0 2016-04-07 2017-02-01      10
1 2017-02-01 2017-03-05       1

回答by pberkes

An alternative, possibly more elegant solution is df.Date2.dt.to_period('M') - df.Date1.dt.to_period('M'), which avoids rounding errors.

另一种可能更优雅的解决方案是 df.Date2.dt.to_period('M') - df.Date1.dt.to_period('M'),它可以避免舍入错误。

回答by Nils

There are two notions of difference in time, which are both correct in a certain sense. Let us compare the difference in months between July 31 and September 01:

时间差有两种概念,在某种意义上都是正确的。让我们比较一下 7 月 31 日和 9 月 1 日之间的月份差异:

import numpy as np
import pandas as pd

dtr = pd.date_range(start="2016-07-31", end="2016-09-01", freq="D")
delta1 = int((dtr[-1] - dtr[0])/np.timedelta64(1,'M'))
delta2 = (dtr[-1].to_period('M') - dtr[0].to_period('M')).n
print(delta1,delta2)

Using numpy's timedelta, delta1=1, which is correct given that there is only one month in between, but delta2=2, which is also correct given that September is still two months away in July. In most cases, both will give the same answer, but one might be more correct than the other given the context.

使用 numpy 的 timedelta, delta1=1,考虑到两者之间只有一个月delta2=2,这是正确的,但是,考虑到 9 月离 7 月还有两个月,这也是正确的。在大多数情况下,两者都会给出相同的答案,但鉴于上下文,一个可能比另一个更正确。