python / pandas 查找两个日期之间的年数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/42521107/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:06:13  来源:igfitidea点击:

python / pandas find number of years between two dates

pythondatepandasdatetime

提问by user139188

I have a pandas dataframe with two columns that contain dates. I want to know the number of years between the two dates while accounting for leap years.

我有一个包含日期的两列的Pandas数据框。我想知道两个日期之间的年数,同时考虑闰年。

The ideal solution would notassume that a year always has a certain number of days. Because years don't always have 365 days.

理想的解决方案不会假设一年总是有特定的天数。因为年并不总是有 365 天。

Sample data:

样本数据:

date_end    date_start
2010-02-09  1933-03-03
2010-03-19  1924-04-08
2010-04-19  1924-04-08
2010-09-06  1924-04-08
2010-09-24  1924-04-08
2010-01-09  1933-04-29
2010-02-26  1933-04-29
2010-01-31  1953-06-10
2010-07-07  1928-11-14
2010-12-01  1974-11-17

date_startand date_endare of the "datetime" dtype. I want a new column that is the number of years between the two dates. It is simple to get the number of days between the two dates (df['diff'] = df.date_end - df.date_start) but then I run into trouble, since the number of years that passed for a given number of days depends on "when" the days happened, because of leap years.

date_start并且date_end是“日期时间”数据类型。我想要一个新列,表示两个日期之间的年数。获取两个日期 ( df['diff'] = df.date_end - df.date_start)之间的天数很简单,但后来我遇到了麻烦,因为给定天数过去的年数取决于日期发生的“时间”,因为闰年​​。

This is analogous to a person's age. I've tried adapting a number of solutions to similar questions, but many questions are about number of days or weeks between two dates. I already have a way of getting the number of years without accounting for leap years, but I want to be more correct than that.

这类似于一个人的年龄。我已经尝试对类似问题采用多种解决方案,但许多问题都与两个日期之间的天数或周数有关。我已经有一种方法可以在不考虑闰年的情况下获得年数,但我想比这更正确。

回答by dataflow

Assuming you want to define a year as 365 days, then you can do this:

假设您想将一年定义为 365 天,那么您可以这样做:

>> df
    date_end date_start  is_leapyear
0 2016-02-28 2015-02-28            0
1 2017-02-28 2016-02-28            1
2 2018-02-28 2017-02-28            0

>> df['diff_in_days'] = df['date_end'] - df['date_start']
>> df['diff_in_years'] = df["diff_in_days"] / timedelta(days=365)
>> print df[["date_end", "date_start", "diff_in_years"]]

>> df
    date_end date_start  is_leapyear  diff_in_years
0 2016-02-28 2015-02-28            0        1.00000
1 2017-02-28 2016-02-28            1        1.00274
2 2018-02-28 2017-02-28            0        1.00000

As you can see, on years with extra days (Feb 29) , more time has elapsed between dates. In your case this would be:

如您所见,在有额外天数的年份(2 月 29 日),日期之间经过的时间更长。在您的情况下,这将是:

    date_end date_start  diff_in_years
0 2010-02-09 1933-03-03      76.991781
1 2010-03-19 1924-04-08      86.002740
2 2010-04-19 1924-04-08      86.087671
3 2010-09-06 1924-04-08      86.471233
4 2010-09-24 1924-04-08      86.520548
5 2010-01-09 1933-04-29      76.750685
6 2010-02-26 1933-04-29      76.882192
7 2010-01-31 1953-06-10      56.682192
8 2010-07-07 1928-11-14      81.698630
9 2010-12-01 1974-11-17      36.063014

On the other hand, if you just want the difference in years. i.e. subtracting the year in which the dates happened (regarless of when in the year that date happened). Then you can do this:

另一方面,如果您只想要年份的差异。即减去日期发生的年份(无论日期发生在哪一年)。然后你可以这样做:

df['date_end_year'] = df.date_end.apply(lambda x: x.year)
df['date_start_year'] = df.date_start.apply(lambda x: x.year)
df['diff_in_years'] = df['date_end_year'] - df['date_start_year']
print df[["date_end", "date_start", "diff_in_years"]]

    date_end date_start  diff_in_years
0 2016-02-28 2015-02-28              1
1 2017-02-28 2016-02-28              1
2 2018-02-28 2017-02-28              1

In your case this would be:

在您的情况下,这将是:

    date_end date_start  diff_in_years
0 2010-02-09 1933-03-03             77
1 2010-03-19 1924-04-08             86
2 2010-04-19 1924-04-08             86
3 2010-09-06 1924-04-08             86
4 2010-09-24 1924-04-08             86
5 2010-01-09 1933-04-29             77
6 2010-02-26 1933-04-29             77
7 2010-01-31 1953-06-10             57
8 2010-07-07 1928-11-14             82
9 2010-12-01 1974-11-17             36

回答by Marta Taulet

You could could convert your columns to years and get the number of years difference as simply as:

您可以将您的列转换为年份并获得年数差异,如下所示:

df.date_end.apply(lambda x: x.year) - df.date_start.apply(lambda x: x.year)

回答by Praveen Yalagandula

Needed to solve the exact same problem and created a num_years() function below to compute it.

需要解决完全相同的问题并在下面创建一个 num_years() 函数来计算它。

The code breaks down the period between start_date and end_date into three periods:

代码将 start_date 和 end_date 之间的时间段分解为三个时间段:

  1. From start_date to the end of that year,
  2. Full years between start_date and end_date, and
  3. From start of the year that contains end_date to the end_date.
  1. 从 start_date 到那年年底,
  2. start_date 和 end_date 之间的整年,以及
  3. 从包含 end_date 的年份开始到 end_date。

It also takes care of cases where start_date and end_date fall in the same year, contiguous years, etc.

它还处理 start_date 和 end_date 在同一年、连续年份等的情况。

def num_days_in_year(date: pd.Timestamp):
    return 366 if date.is_leap_year else 365


def num_years(start_date: pd.Timestamp, end_date: pd.Timestamp) -> float:
    """
    Compute the number of years between two given dates, accounting
    for leap years.
    :param start_date: Start date as Pandas Timestamp
    :param end_date: End date as Pandas Timestamp
    :return: float representing the number of years
    """
    start_year = start_date.year
    end_year = end_date.year

    years = 0.0
    if start_year != end_year:
        send = start_date + pd.offsets.YearEnd()
        years += (send - start_date).days / num_days_in_year(start_date)
    if end_year > start_year + 1:
        years += end_year - start_year- 1

    if start_year == end_year:
        start = start_date
    else:
        start = end_date - pd.offsets.YearBegin()
    years += (end_date - start).days / num_days_in_year(end_date)
    return years