Pandas Python - 将 HH:MM:SS 转换为聚合中的秒数(csv 文件)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/28845825/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas Python - convert HH:MM:SS into seconds in aggegate (csv file)
提问by Yumi
I'm trying to convert the numbers in 'Avg. Session Duration'(HH:MM:SS) column into whole numbers (in seconds) in Pandas read_csvmodule/function.
For instance, '0:03:26' would be 206 seconds after the conversion.
我正在尝试将 'Avg. 会话持续时间'(HH:MM:SS) 列转换为 Pandasread_csv模块/函数中的整数(以秒为单位)。例如,'0:03:26' 将是转换后的 206 秒。
Input example:
输入示例:
Source Month Sessions Bounce Rate Avg. Session Duration
ABC.com 201501 408 26.47% 0:03:26
EFG.com 201412 398 31.45% 0:04:03
I wrote a function:
我写了一个函数:
def time_convert(x):
times = x.split(':')
return (60*int(times[0])+60*int(times[1]))+int(times[2])
This function works just fine while simply passing '0:03:26' to the function. But when I was trying to create a new column 'Duration' by applying the function to another column in Pandas,
这个函数工作得很好,只需将 '0:03:26' 传递给函数。但是当我试图通过将函数应用于 Pandas 中的另一列来创建一个新列“持续时间”时,
df = pd.read_csv('myfile.csv')
df['Duration'] = df['Avg. Session Duration'].apply(time_convert)
It returned an Error Message:
它返回了一条错误消息:
> --------------------------------------------------------------------------- AttributeError Traceback (most recent call
> last) <ipython-input-53-01e79de1cb39> in <module>()
> ----> 1 df['Avg. Session Duration'] = df['Avg. Session Duration'].apply(lambda x: x.split(':'))
>
> /Users/yumiyang/anaconda/lib/python2.7/site-packages/pandas/core/series.pyc
> in apply(self, func, convert_dtype, args, **kwds) 1991
> values = lib.map_infer(values, lib.Timestamp) 1992
> -> 1993 mapped = lib.map_infer(values, f, convert=convert_dtype) 1994 if len(mapped) and
> isinstance(mapped[0], Series): 1995 from
> pandas.core.frame import DataFrame
>
> /Users/yumiyang/anaconda/lib/python2.7/site-packages/pandas/lib.so in
> pandas.lib.map_infer (pandas/lib.c:52281)()
>
> <ipython-input-53-01e79de1cb39> in <lambda>(x)
> ----> 1 df['Avg. Session Duration'] = df['Avg. Session Duration'].apply(lambda x: x.split(':'))
>
> AttributeError: 'float' object has no attribute 'split'
I don't know why it says values of 'Avg. Session Duration' are float.
我不知道为什么它说 'Avg. 会话持续时间'是浮动的。
Data columns (total 7 columns):
Source 250 non-null object
Time 251 non-null object
Sessions 188 non-null object
Users 188 non-null object
Bounce Rate 188 non-null object
Avg. Session Duration 188 non-null object
% New Sessions 188 non-null object
dtypes: object(7)
Can someone help me figure out where the problem is?
有人可以帮我找出问题所在吗?
采纳答案by jfs
The error means that the column is recognized as float, not string. Fix the way you read the data e.g.:
该错误意味着该列被识别为浮点数,而不是字符串。修复您读取数据的方式,例如:
#!/usr/bin/env python
import sys
import pandas
def hh_mm_ss2seconds(hh_mm_ss):
return reduce(lambda acc, x: acc*60 + x, map(int, hh_mm_ss.split(':')))
df = pandas.read_csv('input.csv', sep=r'\s{2,}',
converters={'Avg. Session Duration': hh_mm_ss2seconds})
print(df)
Output
输出
Source Month Sessions Bounce Rate Avg. Session Duration
0 ABC.com 201501 408 26.47% 206
1 EFG.com 201412 398 31.45% 243
[2 rows x 5 columns]
回答by JAB
df['Avg. Session Duration']should be strings for your function to work.
df['Avg. Session Duration']应该是您的函数工作的字符串。
df =pd.DataFrame({'time':['0:03:26']})
def time_convert(x):
h,m,s = map(int,x.split(':'))
return (h*60+m)*60+s
df.time.apply(time_convert)
This works fine for me.
这对我来说很好用。
回答by Michael Kazarian
You can convert time to seconds with timeand datetimefrom standard python library:
您可以转换的时间用秒time和datetime从标准Python库:
import time, datetime
def convertTime(t):
x = time.strptime(t,'%H:%M:%S')
return str(int(datetime.timedelta(hours=x.tm_hour,minutes=x.tm_min,seconds=x.tm_sec).total_seconds()))
convertTime('0:03:26') # Output 206.0
convertTime('0:04:03') # Output 243.0

