Pandas DataFrame 中两个日期之间的差异
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/37583870/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Difference between two dates in Pandas DataFrame
提问by Abhishek Shankhadhar
I have many columns in a data frame and I have to find the difference of time in two column named as in_time
and out_time
and put it in the new column in the same data frame.
我在一个数据框中有很多列,我必须在名为in_time
和的两列中找到时间差,out_time
并将其放在同一数据框中的新列中。
The format of time is like this 2015-09-25T01:45:34.372Z
.
时间的格式是这样的2015-09-25T01:45:34.372Z
。
I am using Pandas DataFrame.
我正在使用 Pandas DataFrame。
I want to do like this:
我想这样做:
df.days = df.out_time - df.in_time
I have many columns and I have to increase 1 more column in it named days and put the differences there.
我有很多列,我必须在其中增加 1 个名为 days 的列并将差异放在那里。
回答by EdChum
You need to convert the strings to datetime
dtype, you can then subtract whatever arbitrary date you want and on the resulting series call dt.days
:
您需要将字符串转换为datetime
dtype,然后您可以减去您想要的任意日期以及结果系列调用dt.days
:
In [15]:
df = pd.DataFrame({'date':['2015-09-25T01:45:34.372Z']})
df
Out[15]:
date
0 2015-09-25T01:45:34.372Z
In [19]:
df['date'] = pd.to_datetime(df['date'])
df['day'] = (df['date'] - dt.datetime.now()).dt.days
df
Out[19]:
date day
0 2015-09-25 01:45:34.372 -252
回答by danielhadar
Well, it all kinda depends on the time format you use. I'd recommend using datetime.
好吧,这一切都取决于您使用的时间格式。我建议使用datetime。
If in_time
and out_time
are currently strings, convert them with datetime.strptime()
:
如果in_time
和out_time
当前是字符串,将它们转换为datetime.strptime()
:
from datetime import datetime
f = lambda x: datetime.strptime(x, '%Y-%m-%dT%H:%M:%S.%fZ')
df.in_time = df.in_time.apply(f)
df.out_time = df.out_time.apply(f)
and then you can simply subtract them, and assign the result to a new column named 'days':
然后你可以简单地减去它们,并将结果分配给一个名为“days”的新列:
df['days'] = df.out_time - df.in_time
Example:(3 seconds and 1 day differences)
示例:(3 秒和 1 天的差异)
In[5]: df = pd.DataFrame({'in_time':['2015-09-25T01:45:34.372Z','2015-09-25T01:45:34.372Z'],
'out_time':['2015-09-25T01:45:37.372Z','2015-09-26T01:45:34.372Z']})
In[6]: df
Out[6]:
in_time out_time
0 2015-09-25T01:45:34.372Z 2015-09-25T01:45:37.372Z
1 2015-09-25T01:45:34.372Z 2015-09-26T01:45:34.372Z
In[7]: type(df.loc[0,'in_time'])
Out[7]: str
In[8]: df.in_time = df.in_time.apply(lambda x: datetime.strptime(x, '%Y-%m-%dT%H:%M:%S.%fZ'))
In[9]: df.out_time = df.out_time.apply(lambda x: datetime.strptime(x, '%Y-%m-%dT%H:%M:%S.%fZ'))
In[10]: df # notice that it looks exactly the same, but the type is different
Out[10]:
in_time out_time
0 2015-09-25 01:45:34.372 2015-09-25T01:45:37.372Z
1 2015-09-25 01:45:34.372 2015-09-26T01:45:34.372Z
In[11]: type(df.loc[0,'in_time'])
Out[11]: pandas.tslib.Timestamp
And the creation of the new column:
并创建新列:
In[12]: df['days'] = df.out_time - df.in_time
In[13]: df
Out[13]:
in_time out_time days
0 2015-09-25 01:45:34.372 2015-09-25 01:45:37.372 0 days 00:00:03
1 2015-09-25 01:45:34.372 2015-09-26 01:45:34.372 1 days 00:00:00
Now you can play with the output format. For example, the portion of seconds difference:
现在您可以使用输出格式了。例如,秒差的部分:
In[14]: df.days = df.days.apply(lambda x: x.total_seconds()/60)
In[15]: df
Out[15]:
in_time out_time days
0 2015-09-25 01:45:34.372 2015-09-25 01:45:37.372 0.05
1 2015-09-25 01:45:34.372 2015-09-26 01:45:34.372 1440.00
Note:Regarding the in_time
and out_time
format, notice that I made some assumptions (for example, that you're using a 24H clock (thus using %H
and not %I
)). To play with the format have a look at: strptime()
documentation.
注意:关于in_time
和out_time
格式,请注意我做了一些假设(例如,您使用的是 24 小时时钟(因此使用%H
和 不%I
))。要使用该格式,请查看:strptime()
文档。
Note2:It would obviously be better if you can design your program to use datetime
from the beginning (instead of using strings and converting them).
注2:如果你能设计你的程序datetime
从头开始使用(而不是使用字符串并转换它们),那显然会更好。