pandas 熊猫日期列减法
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/35536053/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas date column subtraction
提问by Neil
I have a pandas dataframe like this..
我有一个像这样的Pandas数据框..
created_time reached_time
2016-01-02 12:57:44 14:20:22
2016-01-02 12:57:44 13:01:38
2016-01-03 10:38:51 12:24:07
2016-01-03 10:38:51 12:32:11
2016-01-03 10:38:52 12:23:20
2016-01-03 10:38:52 12:51:34
2016-01-03 10:38:52 12:53:33
2016-01-03 10:38:52 13:04:08
2016-01-03 10:38:52 13:13:40
I want to subtract these two date columns and want to get time
我想减去这两个日期列并想得到 time
I am doing following in python
我在 python 中进行以下操作
speed['created_time'].dt.time - speed['reached_time']
But it gives me following error
但它给了我以下错误
TypeError: ufunc subtract cannot use operands with types dtype('O') and dtype('<m8[ns]')
TypeError: ufunc subtract cannot use operands with types dtype('O') and dtype('<m8[ns]')
datatype of created_time
is object
and datatype of reached_type
is timedelta64[ns]
的数据类型created_time
是object
与数据类型reached_type
IStimedelta64[ns]
回答by unutbu
You could drop down to NumPy arrays and do the datetime/timedelta arithmeticthere. First, create an array of dates of dtype datetime64[D]
:
您可以下拉到 NumPy 数组并在那里执行datetime/timedelta 算术。首先,创建一个 dtype 日期数组datetime64[D]
:
dates = speed['created_time'].values.astype('datetime64[D]')
Then you have two options: you could convert reached_time
to dates, and subtract dates from dates:
然后您有两个选择:您可以转换reached_time
为日期,并从日期中减去日期:
speed['reached_date'] = dates + speed['reached_time'].values
speed['diff'] = speed['created_time'] - speed['reached_date']
or you could convert created_time
to timedeltas, and subtract timedeltas from timedeltas:
或者您可以转换created_time
为 timedeltas,并从 timedeltas 中减去 timedeltas:
speed['created_delta'] = speed['created_time'].values - dates
speed['diff'] = speed['created_delta'] - speed['reached_time']
import pandas as pd
speed = pd.DataFrame(
{'created_time':
['2016-01-02 12:57:44', '2016-01-02 12:57:44', '2016-01-03 10:38:51',
'2016-01-03 10:38:51', '2016-01-03 10:38:52', '2016-01-03 10:38:52',
'2016-01-03 10:38:52', '2016-01-03 10:38:52', '2016-01-03 10:38:52'],
'reached_time':
['14:20:22', '13:01:38', '12:24:07', '12:32:11', '12:23:20',
'12:51:34', '12:53:33', '13:04:08', '13:13:40']})
speed['reached_time'] = pd.to_timedelta(speed['reached_time'])
speed['created_time'] = pd.to_datetime(speed['created_time'])
dates = speed['created_time'].values.astype('datetime64[D]')
speed['reached_date'] = dates + speed['reached_time'].values
speed['diff'] = speed['created_time'] - speed['reached_date']
# alternatively
# speed['created_delta'] = speed['created_time'].values - dates
# speed['diff'] = speed['created_delta'] - speed['reached_time']
print(speed)
yields
产量
created_time reached_time reached_date diff
0 2016-01-02 12:57:44 14:20:22 2016-01-02 14:20:22 -1 days +22:37:22
1 2016-01-02 12:57:44 13:01:38 2016-01-02 13:01:38 -1 days +23:56:06
2 2016-01-03 10:38:51 12:24:07 2016-01-03 12:24:07 -1 days +22:14:44
3 2016-01-03 10:38:51 12:32:11 2016-01-03 12:32:11 -1 days +22:06:40
4 2016-01-03 10:38:52 12:23:20 2016-01-03 12:23:20 -1 days +22:15:32
5 2016-01-03 10:38:52 12:51:34 2016-01-03 12:51:34 -1 days +21:47:18
6 2016-01-03 10:38:52 12:53:33 2016-01-03 12:53:33 -1 days +21:45:19
7 2016-01-03 10:38:52 13:04:08 2016-01-03 13:04:08 -1 days +21:34:44
8 2016-01-03 10:38:52 13:13:40 2016-01-03 13:13:40 -1 days +21:25:12
Using HRYR's improvement, you can do the computation without dropping down to NumPy arrays (i.e. no need to access .values
):
使用HRYR 的改进,您可以在不下降到 NumPy 数组的情况下进行计算(即无需访问.values
):
dates = speed['created_time'].dt.normalize()
speed['reached_date'] = dates + speed['reached_time']
speed['diff'] = speed['created_time'] - speed['reached_date']
回答by HYRY
Convert created_time
column to datetime first:
首先将created_time
列转换为日期时间:
df["created_time"] = pd.to_datetime(df["created_time"])
Then use df["created_time"] - df["created_time"].dt.normalize()
to get the time part as timedelta
type.
然后用于df["created_time"] - df["created_time"].dt.normalize()
获取时间部分作为timedelta
类型。