在 Pandas 中将分钟格式的时间列转换为 HH:MM:SS 格式的时间

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/31233146/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:34:17  来源:igfitidea点击:

Convert columns of time in minutes format to time in HH:MM:SS format in pandas

pythoncsvpandas

提问by Adam Pitchie

I am using a script to interpolate stop times from the format HH:MM:SS into minute int values. The script is as follows.

我正在使用脚本将 HH:MM:SS 格式的停止时间插入到分钟 int 值中。脚本如下。

# read in new csv file
reindexed = pd.read_csv('output/stop_times.csv')

for col in ('arrival_time', 'departure_time'):
    # extract hh:mm:ss values
    df = reindexed[col].str.extract(
        r'(?P<hour>\d+):(?P<minute>\d+):(?P<second>\d+)').astype('float')
    # convert to int value
    reindexed[col] = df['hour'] * 60 + df['minute']
    # interpolate
    reindexed[col] = reindexed[col].interpolate()
    reindexed[col] = np.round(reindexed[col], decimals=2)
    reindexed.to_csv('output/stop_times.csv', index=False)

# convert minutes back to HH:MM:SS

What I would now like is to convert those values back into a HH:MM:SS format, but I am having trouble figuring that out. I have a hunch that the method is hidden somewhere in the timeseries documentation, but I have come up short of a result.

我现在想要的是将这些值转换回 HH:MM:SS 格式,但我无法弄清楚。我有一种预感,该方法隐藏在时间序列文档的某处,但我没有得出结果。

Here is a sample CSV derived from the larger stop_times.csv file that I am using. The arrival_timeand departure_timecolumns are of focus:

这是从我正在使用的较大 stop_times.csv 文件派生的示例 CSV。该ARRIVAL_TIMEDEPARTURE_TIME列为重点:

stop_id,stop_code,stop_name,stop_desc,stop_lat,stop_lon,location_type,parent_station,trip_id,arrival_time,departure_time,stop_sequence,pickup_type,drop_off_type,stop_headsign
02303,02303,LCC Station Bay C,lcc_c,44.00981229999999,-123.0351463,,99994.0,1475360,707.0,707.0,1,0,0,82 EUGENE STATION
01092,01092,N/S of 30th E of University,,44.0242826,-123.07484540000002,,,1475360,709.67,709.67,2,0,0,82 EUGENE STATION
01089,01089,N/S of 30th W of Alder,,44.0242545,-123.08092409999999,,,1475360,712.33,712.33,3,0,0,82 EUGENE STATION
01409,01409,"Amazon Station, Bay A",amz_a,44.026660799999995,-123.08448870000001,,99993.0,1475360,715.0,715.0,4,0,0,82 EUGENE STATION
01222,01222,E/S of Amazon Prkwy N of 24th,,44.0339371,-123.0887632,,,1475360,715.75,715.75,5,0,0,82 EUGENE STATION
01548,01548,E/S of Amazon Pkwy S of 19th,,44.038014700000005,-123.0896553,,,1475360,716.5,716.5,6,0,0,82 EUGENE STATION

Here is a reference for deriving HH:MM:SS values from a time value in minutes:

以下是从以分钟为单位的时间值导出 HH:MM:SS 值的参考:

78.6 minutes can be converted to hours by dividing 78.6 minutes / 60 minutes/hour = 1.31 hours
1.31 hours can be broken down to 1 hour plus 0.31 hours - 1 hour
0.31 hours * 60 minutes/hour = 18.6 minutes - 18 minutes
0.6 minutes * 60 seconds/minute = 36 seconds - 36 seconds

Any help is much appreciated. Thanks in advance!

任何帮助深表感谢。提前致谢!

回答by unutbu

Per the previous question, perhaps the best thing to do would be to keep the original HH:MM:SS strings:

根据上一个问题,也许最好的办法是保留原始的 HH:MM:SS 字符串:

So instead of

所以代替

for col in ('arrival_time', 'departure_time'):
    df = reindexed[col].str.extract(
        r'(?P<hour>\d+):(?P<minute>\d+):(?P<second>\d+)').astype('float')
    reindexed[col] = df['hour'] * 60 + df['minute']

use

for col in ('arrival_time', 'departure_time'):
    newcol = '{}_minutes'.format(col)
    df = reindexed[col].str.extract(
        r'(?P<hour>\d+):(?P<minute>\d+):(?P<second>\d+)').astype('float')
    reindexed[newcol] = df['hour'] * 60 + df['minute']

Then you don't have to do any new calculations to recover the HH:MM:SS strings. reindexed['arrival_time']will still be the original HH:MM:SS strings, and reindexed['arrival_time_minutes']would be the time duration in minutes.

这样您就不必进行任何新计算来恢复 HH:MM:SS 字符串。 reindexed['arrival_time']仍将是原始的 HH:MM:SS 字符串,并且 reindexed['arrival_time_minutes']是以分钟为单位的持续时间。



Building on Jianxun Li's solution, to chop off the microseconds, you could multiply the minutes by 60 and then call astype(int):

基于李建勋的解决方案,要减少微秒,您可以将分钟乘以 60,然后调用astype(int)

import numpy as np
import pandas as pd

np.random.seed(0)
df = pd.DataFrame(np.random.rand(3) * 1000, columns=['minutes'])
df['HH:MM:SS'] = pd.to_timedelta((60*df['minutes']).astype('int'), unit='s')

which yields

这产生

      minutes  HH:MM:SS
0  548.813504  09:08:48
1  715.189366  11:55:11
2  602.763376  10:02:45

Note that the df['HH:MM:SS']column contains pd.Timedeltas:

请注意,该df['HH:MM:SS']列包含pd.Timedeltas:

In [240]: df['HH:MM:SS'].iloc[0]
Out[240]: Timedelta('0 days 09:08:48')

However, if you try to store this data in a csv

但是,如果您尝试将此数据存储在 csv 中

In [223]: df.to_csv('/tmp/out', date_format='%H:%M:%S')

you get:

你得到:

,minutes,HH:MM:SS
0,548.813503927,0 days 09:08:48.000000000
1,715.189366372,0 days 11:55:11.000000000
2,602.763376072,0 days 10:02:45.000000000

If the minute values are too big, you would also see daysas part of the timedelta string representation:

如果分钟值太大,您还会看到daystimedelta 字符串表示的一部分:

np.random.seed(0)
df = pd.DataFrame(np.random.rand(3) * 10000, columns=['minutes'])
df['HH:MM:SS'] = pd.to_timedelta((60*df['minutes']).astype('int'), unit='s')

yields

产量

       minutes        HH:MM:SS
0  5488.135039 3 days 19:28:08
1  7151.893664 4 days 23:11:53
2  6027.633761 4 days 04:27:38

That might not be what you want. In that case, instead of

那可能不是您想要的。在这种情况下,而不是

df['HH:MM:SS'] = pd.to_timedelta((60*df['minutes']).astype('int'), unit='s')

per Phillip Cloud's solutionyou could use

根据Phillip Cloud 的解决方案,您可以使用

import operator
fmt = operator.methodcaller('strftime', '%H:%M:%S')
df['HH:MM:SS'] = pd.to_datetime(df['minutes'], unit='m').map(fmt)

The result looks the same, but now the df['HH:MM:SS']column contains strings

结果看起来相同,但现在该df['HH:MM:SS']列包含字符串

In [244]: df['HH:MM:SS'].iloc[0]
Out[244]: '09:08:48'

Note that this chops off (omits) both the whole days and the microseconds. Writing the DataFrame to a CSV

请注意,这会切断(省略)一整天和微秒。将数据帧写入 CSV

In [229]: df.to_csv('/tmp/out', date_format='%H:%M:%S')

now yields

现在产量

,minutes,HH:MM:SS
0,548.813503927,09:08:48
1,715.189366372,11:55:11
2,602.763376072,10:02:45

回答by Jianxun Li

You may want to consider using pd.to_timedelta.

您可能需要考虑使用pd.to_timedelta.

import pandas as pd
import numpy as np

np.random.seed(0)
df = pd.DataFrame(np.random.rand(10) * 1000, columns=['time_in_minutes'])

Out[94]: 
   time_in_minutes
0         548.8135
1         715.1894
2         602.7634
3         544.8832
4         423.6548
5         645.8941
6         437.5872
7         891.7730
8         963.6628
9         383.4415

# As Jeff suggests, pd.to_timedelta is a very handy tool to do this
df['time_delta'] = pd.to_timedelta(df.time_in_minutes, unit='m')


Out[96]: 
   time_in_minutes      time_delta
0         548.8135 09:08:48.810235
1         715.1894 11:55:11.361982
2         602.7634 10:02:45.802564
3         544.8832 09:04:52.990979
4         423.6548 07:03:39.287960
5         645.8941 10:45:53.646784
6         437.5872 07:17:35.232675
7         891.7730 14:51:46.380046
8         963.6628 16:03:39.765630
9         383.4415 06:23:26.491129