Python 在新的熊猫数据框列中计算以年、月等为单位的日期时间差
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/31490816/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
calculate datetime-difference in years, months, etc. in a new pandas dataframe column
提问by beta
I have a pandas dataframe looking like this:
我有一个看起来像这样的熊猫数据框:
Name start end
A 2000-01-10 1970-04-29
I want to add a new column providing the difference between the start
and end
column in years, months, days.
我想添加一个新列start
,end
以年、月、日为单位提供和列之间的差异。
So the result should look like:
所以结果应该是这样的:
Name start end diff
A 2000-01-10 1970-04-29 29y9m etc.
the diff column may also be a datetime
object or a timedelta
object, but the key point for me is, that I can easily get the Yearand Monthout of it.
diff 列也可能是一个datetime
对象或一个timedelta
对象,但对我来说关键点是,我可以轻松地从中获取年和月。
What I tried until now is:
到目前为止我尝试过的是:
df['diff'] = df['end'] - df['start']
This results in the new column containing 10848 days
. However, I do not know how to convert the days to 29y9m etc.
这导致新列包含10848 days
. 但是,我不知道如何将天数转换为29y9m 等。
采纳答案by omri_saadon
With a simple function you can reach your goal.
通过一个简单的功能,您可以达到您的目标。
The function calculates the years difference and the months difference with a simple calculation.
该函数通过简单的计算来计算年差和月差。
import pandas as pd
import datetime
def parse_date(td):
resYear = float(td.days)/364.0 # get the number of years including the the numbers after the dot
resMonth = int((resYear - int(resYear))*364/30) # get the number of months, by multiply the number after the dot by 364 and divide by 30.
resYear = int(resYear)
return str(resYear) + "Y" + str(resMonth) + "m"
df = pd.DataFrame([("2000-01-10", "1970-04-29")], columns=["start", "end"])
df["delta"] = [parse_date(datetime.datetime.strptime(start, '%Y-%m-%d') - datetime.datetime.strptime(end, '%Y-%m-%d')) for start, end in zip(df["start"], df["end"])]
print df
start end delta
0 2000-01-10 1970-04-29 29Y9m
回答by DeepSpace
Pretty much straightforward with relativedelta
:
非常简单relativedelta
:
from dateutil import relativedelta
>> end start
>> 0 1970-04-29 2000-01-10
for i in df.index:
df.at[i, 'diff'] = relativedelta.relativedelta(df.ix[i, 'start'], df.ix[i, 'end'])
>> end start diff
>> 0 1970-04-29 2000-01-10 relativedelta(years=+29, months=+8, days=+12)
回答by Anand S Kumar
You can try the following function to calculate the difference -
您可以尝试以下函数来计算差异 -
def yearmonthdiff(row):
s = row['start']
e = row['end']
y = s.year - e.year
m = s.month - e.month
d = s.day - e.day
if m < 0:
y = y - 1
m = m + 12
if m == 0:
if d < 0:
m = m -1
elif d == 0:
s1 = s.hour*3600 + s.minute*60 + s.second
s2 = e.hour*3600 + e.minut*60 + e.second
if s1 < s2:
m = m - 1
return '{}y{}m'.format(y,m)
Where row is the dataframe row
. I am assuming your start
and end
columns are datetime
objects. Then you can use DataFrame.apply()
function to apply it to each row.
其中 row 是数据框row
。我假设您的start
和end
列是datetime
对象。然后您可以使用DataFrame.apply()
函数将其应用于每一行。
df
Out[92]:
start end
0 2000-01-10 00:00:00.000000 1970-04-29 00:00:00.000000
1 2015-07-18 17:54:59.070381 2014-01-11 17:55:10.053381
df['diff'] = df.apply(yearmonthdiff, axis=1)
In [97]: df
Out[97]:
start end diff
0 2000-01-10 00:00:00.000000 1970-04-29 00:00:00.000000 29y9m
1 2015-07-18 17:54:59.070381 2014-01-11 17:55:10.053381 1y6m
回答by Avi Gelbgiser
I think this is the most 'pandas' way to do it, without using any for loops or defining external functions:
我认为这是最“熊猫”的方式,无需使用任何 for 循环或定义外部函数:
>>> df = pd.DataFrame({'Name': ['A'], 'start': [datetime(2000, 1, 10)], 'end': [datetime(1970, 4, 29)]})
>>> df['diff'] = map(lambda td: datetime(1, 1, 1) + td, list(df['start'] - df['end']))
>>> df['diff'] = df['diff'].apply(lambda d: '{0}y{1}m'.format(d.year - 1, d.month - 1))
>>> df
Name end start diff
0 A 1970-04-29 2000-01-10 29y8m
Had to use map instead of apply because of pandas' timedelda64, which doesn't allow a simple addition to a datetime object.
由于 Pandas 的 timedelda64,它不允许对 datetime 对象进行简单的添加,因此不得不使用 map 而不是 apply。
回答by scottlittle
Similar to @DeepSpace's answer, here's a SAS-like implementation:
与@DeepSpace 的回答类似,这里有一个类似 SAS 的实现:
import pandas as pd
from dateutil import relativedelta
def intck_month( start, end ):
rd = relativedelta.relativedelta( pd.to_datetime( end ), pd.to_datetime( start ) )
return rd.years, rd.months
Usage:
用法:
>> years, months = intck_month('1960-01-01', '1970-03-01')
>> print(years)
10
>> print(months)
2
回答by Pranav Kansara
A much simpler way is to use date_range function and calculate length of the same
一个更简单的方法是使用 date_range 函数并计算相同的长度
startdt=pd.to_datetime('2017-01-01')
enddt = pd.to_datetime('2018-01-01')
len(pd.date_range(start=startdt,end=enddt,freq='M'))
回答by jomesoke
You can try by creating a new column with years in this way:
您可以尝试以这种方式创建一个带有年份的新列:
df['diff_year'] = df['diff'] / np.timedelta64(1, 'Y')