Python 计算 Pandas Dataframe 索引之间的时间差
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 
原文地址: http://stackoverflow.com/questions/16777570/
Warning: these are provided under cc-by-sa 4.0 license.  You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Calculate time difference between Pandas Dataframe indices
提问by ghpguru
I am trying to add a column of deltaT to a dataframe where deltaT is the time difference between the successive rows (as indexed in the timeseries).
我正在尝试将一列 deltaT 添加到数据帧中,其中 deltaT 是连续行之间的时间差(如时间序列中的索引)。
time                 value
2012-03-16 23:50:00      1
2012-03-16 23:56:00      2
2012-03-17 00:08:00      3
2012-03-17 00:10:00      4
2012-03-17 00:12:00      5
2012-03-17 00:20:00      6
2012-03-20 00:43:00      7
Desired result is something like the following (deltaT units shown in minutes):
所需的结果类似于以下内容(以分钟为单位显示 deltaT 单位):
time                 value  deltaT
2012-03-16 23:50:00      1       0
2012-03-16 23:56:00      2       6
2012-03-17 00:08:00      3      12
2012-03-17 00:10:00      4       2
2012-03-17 00:12:00      5       2
2012-03-17 00:20:00      6       8
2012-03-20 00:43:00      7      23
回答by Jeff
Note this is using numpy >= 1.7, for numpy < 1.7, see the conversion here: http://pandas.pydata.org/pandas-docs/dev/timeseries.html#time-deltas
请注意,这是使用 numpy >= 1.7,对于 numpy < 1.7,请参阅此处的转换:http: //pandas.pydata.org/pandas-docs/dev/timeseries.html#time-deltas
Your original frame, with a datetime index
您的原始框架,带有日期时间索引
In [196]: df
Out[196]: 
                     value
2012-03-16 23:50:00      1
2012-03-16 23:56:00      2
2012-03-17 00:08:00      3
2012-03-17 00:10:00      4
2012-03-17 00:12:00      5
2012-03-17 00:20:00      6
2012-03-20 00:43:00      7
In [199]: df.index
Out[199]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2012-03-16 23:50:00, ..., 2012-03-20 00:43:00]
Length: 7, Freq: None, Timezone: None
Here is the timedelta64 of what you want
这是你想要的 timedelta64
In [200]: df['tvalue'] = df.index
In [201]: df['delta'] = (df['tvalue']-df['tvalue'].shift()).fillna(0)
In [202]: df
Out[202]: 
                     value              tvalue            delta
2012-03-16 23:50:00      1 2012-03-16 23:50:00         00:00:00
2012-03-16 23:56:00      2 2012-03-16 23:56:00         00:06:00
2012-03-17 00:08:00      3 2012-03-17 00:08:00         00:12:00
2012-03-17 00:10:00      4 2012-03-17 00:10:00         00:02:00
2012-03-17 00:12:00      5 2012-03-17 00:12:00         00:02:00
2012-03-17 00:20:00      6 2012-03-17 00:20:00         00:08:00
2012-03-20 00:43:00      7 2012-03-20 00:43:00 3 days, 00:23:00
Getting out the answer while disregarding the day difference (your last day is 3/20, prior is 3/17), actually is tricky
不顾天差得出答案(你的最后一天是3/20,之前是3/17),实际上很棘手
In [204]: df['ans'] = df['delta'].apply(lambda x: x  / np.timedelta64(1,'m')).astype('int64') % (24*60)
In [205]: df
Out[205]: 
                     value              tvalue            delta  ans
2012-03-16 23:50:00      1 2012-03-16 23:50:00         00:00:00    0
2012-03-16 23:56:00      2 2012-03-16 23:56:00         00:06:00    6
2012-03-17 00:08:00      3 2012-03-17 00:08:00         00:12:00   12
2012-03-17 00:10:00      4 2012-03-17 00:10:00         00:02:00    2
2012-03-17 00:12:00      5 2012-03-17 00:12:00         00:02:00    2
2012-03-17 00:20:00      6 2012-03-17 00:20:00         00:08:00    8
2012-03-20 00:43:00      7 2012-03-20 00:43:00 3 days, 00:23:00   23
回答by Nickil Maveli
We can create a series with both index and values equal to the index keys using to_seriesand then compute the differences between successive rows which would result in timedelta64[ns]dtype. After obtaining this, via the .dtproperty, we could access the seconds attribute of the time portion and finally divide each element by 60 to get it outputted in minutes(optionally filling the first value with 0).
我们可以创建一个索引和值都等于索引键的系列,to_series然后计算连续行之间的差异,这将导致timedelta64[ns]dtype。得到这个之后,通过.dt属性,我们可以访问时间部分的 seconds 属性,最后将每个元素除以 60,以分钟为单位输出(可选地用 0 填充第一个值)。
In [13]: df['deltaT'] = df.index.to_series().diff().dt.seconds.div(60, fill_value=0)
    ...: df                                 # use .astype(int) to obtain integer values
Out[13]: 
                     value  deltaT
time                              
2012-03-16 23:50:00      1     0.0
2012-03-16 23:56:00      2     6.0
2012-03-17 00:08:00      3    12.0
2012-03-17 00:10:00      4     2.0
2012-03-17 00:12:00      5     2.0
2012-03-17 00:20:00      6     8.0
2012-03-20 00:43:00      7    23.0
simplification:
简化:
When we perform diff:
当我们执行diff:
In [8]: ser_diff = df.index.to_series().diff()
In [9]: ser_diff
Out[9]: 
time
2012-03-16 23:50:00               NaT
2012-03-16 23:56:00   0 days 00:06:00
2012-03-17 00:08:00   0 days 00:12:00
2012-03-17 00:10:00   0 days 00:02:00
2012-03-17 00:12:00   0 days 00:02:00
2012-03-17 00:20:00   0 days 00:08:00
2012-03-20 00:43:00   3 days 00:23:00
Name: time, dtype: timedelta64[ns]
Seconds to minutes conversion:
秒到分钟的转换:
In [10]: ser_diff.dt.seconds.div(60, fill_value=0)
Out[10]: 
time
2012-03-16 23:50:00     0.0
2012-03-16 23:56:00     6.0
2012-03-17 00:08:00    12.0
2012-03-17 00:10:00     2.0
2012-03-17 00:12:00     2.0
2012-03-17 00:20:00     8.0
2012-03-20 00:43:00    23.0
Name: time, dtype: float64
If suppose you want to include even the dateportion as it was excluded previously(only time portion was considered), dt.total_secondswould give you the elapsed duration in seconds with which minutes could then be calculated again by division.
如果假设您甚至想包括date之前排除的部分(仅考虑时间部分),dt.total_seconds则会为您提供以秒为单位的经过持续时间,然后可以通过除法再次计算分钟。
In [12]: ser_diff.dt.total_seconds().div(60, fill_value=0)
Out[12]: 
time
2012-03-16 23:50:00       0.0
2012-03-16 23:56:00       6.0
2012-03-17 00:08:00      12.0
2012-03-17 00:10:00       2.0
2012-03-17 00:12:00       2.0
2012-03-17 00:20:00       8.0
2012-03-20 00:43:00    4343.0    # <-- number of minutes in 3 days 23 minutes
Name: time, dtype: float64
回答by Shijith
Also can typecastdf.index.to_series().diff()from timedelta64[ns](nano seconds- default dtype) to timedelta64[m](minutes)
也可以从(nano seconds-default dtype) 到(minutes)类型转换df.index.to_series().diff()timedelta64[ns]timedelta64[m]
df['ΔT'] = df.index.to_series().diff().astype('timedelta64[m]')
                     value      ΔT
time                              
2012-03-16 23:50:00      1     NaN
2012-03-16 23:56:00      2     6.0
2012-03-17 00:08:00      3    12.0
2012-03-17 00:10:00      4     2.0
2012-03-17 00:12:00      5     2.0
2012-03-17 00:20:00      6     8.0
2012-03-20 00:43:00      7  4343.0
(ΔTdtype:float64)
( ΔT dtype:float64)
if you want to convert to int, fill navalues with 0before converting
如果你要转换int,填充na值与0转换前
>>> df.index.to_series().diff().fillna(0).astype('timedelta64[m]').astype('int')
time
2012-03-16 23:50:00       0
2012-03-16 23:56:00       6
2012-03-17 00:08:00      12
2012-03-17 00:10:00       2
2012-03-17 00:12:00       2
2012-03-17 00:20:00       8
2012-03-20 00:43:00    4343
Name: time, dtype: int64
Timedelta data types support a large number of time units, as well as generic units which can be coerced into any of the other units.
Timedelta 数据类型支持大量时间单位,以及可以强制转换为任何其他单位的通用单位。
Below are the date units:
以下是日期单位:
Y   year
M   month
W   week
D   day
below are the time units:
以下是时间单位:
h   hour
m   minute
s   second
ms  millisecond
us  microsecond
ns  nanosecond
ps  picosecond
fs  femtosecond
as  attosecond

