Python 在新的熊猫数据框列中计算以年、月等为单位的日期时间差

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/31490816/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 10:05:42  来源:igfitidea点击:

calculate datetime-difference in years, months, etc. in a new pandas dataframe column

pythondatetimepandastimedelta

提问by beta

I have a pandas dataframe looking like this:

我有一个看起来像这样的熊猫数据框:

Name    start        end
A       2000-01-10   1970-04-29

I want to add a new column providing the difference between the startand endcolumn in years, months, days.

我想添加一个新列startend以年、月、日为单位提供和列之间的差异。

So the result should look like:

所以结果应该是这样的:

Name    start        end          diff
A       2000-01-10   1970-04-29   29y9m etc.

the diff column may also be a datetimeobject or a timedeltaobject, but the key point for me is, that I can easily get the Yearand Monthout of it.

diff 列也可能是一个datetime对象或一个timedelta对象,但对我来说关键点是,我可以轻松地从中获取

What I tried until now is:

到目前为止我尝试过的是:

df['diff'] = df['end'] - df['start']

This results in the new column containing 10848 days. However, I do not know how to convert the days to 29y9m etc.

这导致新列包含10848 days. 但是,我不知道如何将天数转换为29y9m 等。

采纳答案by omri_saadon

With a simple function you can reach your goal.

通过一个简单的功能,您可以达到您的目标。

The function calculates the years difference and the months difference with a simple calculation.

该函数通过简单的计算来计算年差和月差。

import pandas as pd
import datetime

def parse_date(td):
    resYear = float(td.days)/364.0                   # get the number of years including the the numbers after the dot
    resMonth = int((resYear - int(resYear))*364/30)  # get the number of months, by multiply the number after the dot by 364 and divide by 30.
    resYear = int(resYear)
    return str(resYear) + "Y" + str(resMonth) + "m"

df = pd.DataFrame([("2000-01-10", "1970-04-29")], columns=["start", "end"])
df["delta"] = [parse_date(datetime.datetime.strptime(start, '%Y-%m-%d') - datetime.datetime.strptime(end, '%Y-%m-%d')) for start, end in zip(df["start"], df["end"])]
print df

        start         end  delta
0  2000-01-10  1970-04-29  29Y9m

回答by DeepSpace

Pretty much straightforward with relativedelta:

非常简单relativedelta

from dateutil import relativedelta

>>          end      start
>> 0 1970-04-29 2000-01-10

for i in df.index:
    df.at[i, 'diff'] = relativedelta.relativedelta(df.ix[i, 'start'], df.ix[i, 'end'])

>>          end      start                                           diff
>> 0 1970-04-29 2000-01-10  relativedelta(years=+29, months=+8, days=+12)

回答by Anand S Kumar

You can try the following function to calculate the difference -

您可以尝试以下函数来计算差异 -

def yearmonthdiff(row):
    s = row['start']
    e = row['end']
    y = s.year - e.year
    m = s.month - e.month
    d = s.day - e.day
    if m < 0:
        y = y - 1
        m = m + 12
    if m == 0:
        if d < 0:
            m = m -1
        elif d == 0:
            s1 = s.hour*3600 + s.minute*60 + s.second
            s2 = e.hour*3600 + e.minut*60 + e.second
            if s1 < s2:
                m = m - 1
    return '{}y{}m'.format(y,m)

Where row is the dataframe row. I am assuming your startand endcolumns are datetimeobjects. Then you can use DataFrame.apply()function to apply it to each row.

其中 row 是数据框row。我假设您的startend列是datetime对象。然后您可以使用DataFrame.apply()函数将其应用于每一行。

df

Out[92]:
                       start                        end
0 2000-01-10 00:00:00.000000 1970-04-29 00:00:00.000000
1 2015-07-18 17:54:59.070381 2014-01-11 17:55:10.053381

df['diff'] = df.apply(yearmonthdiff, axis=1)

In [97]: df
Out[97]:
                       start                        end   diff
0 2000-01-10 00:00:00.000000 1970-04-29 00:00:00.000000  29y9m
1 2015-07-18 17:54:59.070381 2014-01-11 17:55:10.053381   1y6m

回答by Avi Gelbgiser

I think this is the most 'pandas' way to do it, without using any for loops or defining external functions:

我认为这是最“熊猫”的方式,无需使用任何 for 循环或定义外部函数:

>>> df = pd.DataFrame({'Name': ['A'], 'start': [datetime(2000, 1, 10)], 'end': [datetime(1970, 4, 29)]})
>>> df['diff'] = map(lambda td: datetime(1, 1, 1) + td, list(df['start'] - df['end']))
>>> df['diff'] = df['diff'].apply(lambda d: '{0}y{1}m'.format(d.year - 1, d.month - 1))
>>> df
  Name        end      start   diff
0    A 1970-04-29 2000-01-10  29y8m

Had to use map instead of apply because of pandas' timedelda64, which doesn't allow a simple addition to a datetime object.

由于 Pandas 的 timedelda64,它不允许对 datetime 对象进行简单的添加,因此不得不使用 map 而不是 apply。

回答by scottlittle

Similar to @DeepSpace's answer, here's a SAS-like implementation:

与@DeepSpace 的回答类似,这里有一个类似 SAS 的实现:

import pandas as pd
from dateutil import relativedelta

def intck_month( start, end ):
    rd = relativedelta.relativedelta( pd.to_datetime( end ), pd.to_datetime( start ) )
    return rd.years, rd.months

Usage:

用法:

>> years, months = intck_month('1960-01-01', '1970-03-01')
>> print(years)
10
>> print(months)
2

回答by Pranav Kansara

A much simpler way is to use date_range function and calculate length of the same

一个更简单的方法是使用 date_range 函数并计算相同的长度

startdt=pd.to_datetime('2017-01-01')
enddt = pd.to_datetime('2018-01-01')
len(pd.date_range(start=startdt,end=enddt,freq='M'))

回答by jomesoke

You can try by creating a new column with years in this way:

您可以尝试以这种方式创建一个带有年份的新列:

df['diff_year'] = df['diff'] / np.timedelta64(1, 'Y')