Pandas DataFrame 中两个日期之间的差异

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/37583870/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:19:55  来源:igfitidea点击:

Difference between two dates in Pandas DataFrame

python-2.7pandasdataframemachine-learning

提问by Abhishek Shankhadhar

I have many columns in a data frame and I have to find the difference of time in two column named as in_timeand out_timeand put it in the new column in the same data frame.

我在一个数据框中有很多列,我必须在名为in_time和的两列中找到时间差,out_time并将其放在同一数据框中的新列中。

The format of time is like this 2015-09-25T01:45:34.372Z.

时间的格式是这样的2015-09-25T01:45:34.372Z

I am using Pandas DataFrame.

我正在使用 Pandas DataFrame。

I want to do like this:

我想这样做:

df.days = df.out_time - df.in_time


I have many columns and I have to increase 1 more column in it named days and put the differences there.


我有很多列,我必须在其中增加 1 个名为 days 的列并将差异放在那里。

回答by EdChum

You need to convert the strings to datetimedtype, you can then subtract whatever arbitrary date you want and on the resulting series call dt.days:

您需要将字符串转换为datetimedtype,然后您可以减去您想要的任意日期以及结果系列调用dt.days

In [15]:
df = pd.DataFrame({'date':['2015-09-25T01:45:34.372Z']})
df

Out[15]:
                       date
0  2015-09-25T01:45:34.372Z

In [19]:
df['date'] = pd.to_datetime(df['date'])
df['day'] = (df['date'] - dt.datetime.now()).dt.days
df

Out[19]:
                     date  day
0 2015-09-25 01:45:34.372 -252

回答by danielhadar

Well, it all kinda depends on the time format you use. I'd recommend using datetime.

好吧,这一切都取决于您使用的时间格式。我建议使用datetime

If in_timeand out_timeare currently strings, convert them with datetime.strptime():

如果in_timeout_time当前是字符串,将它们转换为datetime.strptime()

from datetime import datetime

f = lambda x: datetime.strptime(x, '%Y-%m-%dT%H:%M:%S.%fZ')
df.in_time = df.in_time.apply(f)
df.out_time = df.out_time.apply(f)

and then you can simply subtract them, and assign the result to a new column named 'days':

然后你可以简单地减去它们,并将结果分配给一个名为“days”的新列:

df['days'] = df.out_time - df.in_time

Example:(3 seconds and 1 day differences)

示例:(3 秒和 1 天的差异)

In[5]: df = pd.DataFrame({'in_time':['2015-09-25T01:45:34.372Z','2015-09-25T01:45:34.372Z'],
                          'out_time':['2015-09-25T01:45:37.372Z','2015-09-26T01:45:34.372Z']})
In[6]: df
Out[6]: 
                    in_time                  out_time
0  2015-09-25T01:45:34.372Z  2015-09-25T01:45:37.372Z
1  2015-09-25T01:45:34.372Z  2015-09-26T01:45:34.372Z

In[7]: type(df.loc[0,'in_time'])
Out[7]: str

In[8]: df.in_time = df.in_time.apply(lambda x: datetime.strptime(x, '%Y-%m-%dT%H:%M:%S.%fZ'))   
In[9]: df.out_time = df.out_time.apply(lambda x: datetime.strptime(x, '%Y-%m-%dT%H:%M:%S.%fZ'))

In[10]: df    # notice that it looks exactly the same, but the type is different
Out[10]: 
                  in_time                  out_time
0 2015-09-25 01:45:34.372  2015-09-25T01:45:37.372Z
1 2015-09-25 01:45:34.372  2015-09-26T01:45:34.372Z

In[11]: type(df.loc[0,'in_time'])
Out[11]: pandas.tslib.Timestamp

And the creation of the new column:

并创建新列:

In[12]: df['days'] = df.out_time - df.in_time
In[13]: df
Out[13]: 
                  in_time                out_time            days
0 2015-09-25 01:45:34.372 2015-09-25 01:45:37.372 0 days 00:00:03
1 2015-09-25 01:45:34.372 2015-09-26 01:45:34.372 1 days 00:00:00

Now you can play with the output format. For example, the portion of seconds difference:

现在您可以使用输出格式了。例如,秒差的部分:

In[14]: df.days = df.days.apply(lambda x: x.total_seconds()/60)
In[15]: df
Out[15]: 
                  in_time                out_time     days
0 2015-09-25 01:45:34.372 2015-09-25 01:45:37.372     0.05
1 2015-09-25 01:45:34.372 2015-09-26 01:45:34.372  1440.00

Note:Regarding the in_timeand out_timeformat, notice that I made some assumptions (for example, that you're using a 24H clock (thus using %Hand not %I)). To play with the format have a look at: strptime()documentation.

注意:关于in_timeout_time格式,请注意我做了一些假设(例如,您使用的是 24 小时时钟(因此使用%H和 不%I))。要使用该格式,请查看:strptime()文档

Note2:It would obviously be better if you can design your program to use datetimefrom the beginning (instead of using strings and converting them).

注2:如果你能设计你的程序datetime从头开始使用(而不是使用字符串并转换它们),那显然会更好。