Python Pandas - 将字符串转换为没有日期的时间
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/32375471/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas - convert strings to time without date
提问by RDJ
I've read loads of SO answers but can't find a clear solution.
我已经阅读了大量的 SO 答案,但找不到明确的解决方案。
I have this data in a df called day1
which represents hours:
我在一个 df 中有这个数据,day1
它代表小时:
1 10:53
2 12:17
3 14:46
4 16:36
5 18:39
6 20:31
7 22:28
Name: time, dtype: object>
I want to convert it into a time format. But when I do this:
我想将其转换为时间格式。但是当我这样做时:
day1.time = pd.to_datetime(day1.time, format='H%:M%')
day1.time = pd.to_datetime(day1.time, format='H%:M%')
The result includes today's date:
结果包括今天的日期:
1 2015-09-03 10:53:00
2 2015-09-03 12:17:00
3 2015-09-03 14:46:00
4 2015-09-03 16:36:00
5 2015-09-03 18:39:00
6 2015-09-03 20:31:00
7 2015-09-03 22:28:00
Name: time, dtype: datetime64[ns]>
It seems the format
argument isn't working - how do I get the time as shown here without the date?
似乎这个format
论点不起作用 - 我如何在没有日期的情况下获得此处显示的时间?
Update
更新
The following formats the time correctly, but somehow the column is still an object type. Why doesn't it convert to datetime64
?
以下正确格式化时间,但不知何故该列仍然是对象类型。为什么不转换为datetime64
?
day1['time'] = pd.to_datetime(day1['time'], format='%H:%M').dt.time
day1['time'] = pd.to_datetime(day1['time'], format='%H:%M').dt.time
1 10:53:00
2 12:17:00
3 14:46:00
4 16:36:00
5 18:39:00
6 20:31:00
7 22:28:00
Name: time, dtype: object>
采纳答案by EdChum
After performing the conversion you can use the datetime accessor dt
to access just the hour
or time
component:
执行转换后,您可以使用 datetimedt
访问器仅访问hour
ortime
组件:
In [51]:
df['hour'] = pd.to_datetime(df['time'], format='%H:%M').dt.hour
df
Out[51]:
time hour
index
1 10:53 10
2 12:17 12
3 14:46 14
4 16:36 16
5 18:39 18
6 20:31 20
7 22:28 22
Also your format string H%:M%
is malformed, it's likely to raise a ValueError: ':' is a bad directive in format 'H%:M%'
此外,您的格式字符串H%:M%
格式错误,很可能会引发ValueError: ':' is a bad directive in format 'H%:M%'
Regarding your last comment the dtype is datetime.time
not datetime
:
关于您的最后一条评论,dtypedatetime.time
不是datetime
:
In [53]:
df['time'].iloc[0]
Out[53]:
datetime.time(10, 53)
回答by YOBEN_S
You can use to_timedelta
您可以使用 to_timedelta
pd.to_timedelta(df+':00')
Out[353]:
1 10:53:00
2 12:17:00
3 14:46:00
4 16:36:00
5 18:39:00
6 20:31:00
7 22:28:00
Name: Time, dtype: timedelta64[ns]
回答by Bowen Liu
I recently also struggled with this problem. My method is close to EdChum's method and the result is the same as YOBEN_S's answer.
我最近也在努力解决这个问题。我的方法和EdChum的方法很接近,结果和YOBEN_S的回答一样。
Just like EdChum illustrated, using dt.hour
or dt.time
will give you a datetime.time object, which is probably only good for display. I can barely do any comparison or calculation on these objects. So if you need any further comparison or calculation operations on the result columns, it's better to avoid such data formats.
就像 EdChum 说明的那样,使用dt.hour
ordt.time
会给你一个 datetime.time 对象,它可能只适合显示。我几乎无法对这些对象进行任何比较或计算。因此,如果您需要对结果列进行任何进一步的比较或计算操作,最好避免使用此类数据格式。
My method is just subtract the date from the to_datetime
result:
我的方法只是从to_datetime
结果中减去日期:
c = pd.Series(['10:23', '12:17', '14:46'])
pd.to_datetime(c, format='%H:%M') - pd.to_datetime(c, format='%H:%M').dt.normalize()
The result is
结果是
0 10:23:00
1 12:17:00
2 14:46:00
dtype: timedelta64[ns]
dt.normalize()
basically sets all time component to 00:00:00, and it will only display the date while keeping the datetime64
data format, thereby making it possible to do calculations with it.
dt.normalize()
基本上将所有时间分量设置为00:00:00,它只会在保持datetime64
数据格式的情况下显示日期,从而可以用它进行计算。
My answer is by no means better than the other two. I just want to provide a different approach and hope it helps.
我的答案绝不比其他两个好。我只是想提供一种不同的方法,希望它有所帮助。