Pandas Python - 将 HH:MM:SS 转换为聚合中的秒数(csv 文件)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/28845825/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:00:45  来源:igfitidea点击:

Pandas Python - convert HH:MM:SS into seconds in aggegate (csv file)

pythoncsvtimepandasdataframe

提问by Yumi

I'm trying to convert the numbers in 'Avg. Session Duration'(HH:MM:SS) column into whole numbers (in seconds) in Pandas read_csvmodule/function. For instance, '0:03:26' would be 206 seconds after the conversion.

我正在尝试将 'Avg. 会话持续时间'(HH:MM:SS) 列转换为 Pandasread_csv模块/函数中的整数(以秒为单位)。例如,'0:03:26' 将是转换后的 206 秒。

Input example:

输入示例:

Source       Month  Sessions    Bounce Rate     Avg. Session Duration   
ABC.com     201501   408        26.47%           0:03:26 
EFG.com     201412   398        31.45%           0:04:03

I wrote a function:

我写了一个函数:

def time_convert(x):
    times = x.split(':')
    return (60*int(times[0])+60*int(times[1]))+int(times[2])

This function works just fine while simply passing '0:03:26' to the function. But when I was trying to create a new column 'Duration' by applying the function to another column in Pandas,

这个函数工作得很好,只需将 '0:03:26' 传递给函数。但是当我试图通过将函数应用于 Pandas 中的另一列来创建一个新列“持续时间”时,

df = pd.read_csv('myfile.csv')
df['Duration'] = df['Avg. Session Duration'].apply(time_convert)

It returned an Error Message:

它返回了一条错误消息:

> --------------------------------------------------------------------------- AttributeError                            Traceback (most recent call
> last) <ipython-input-53-01e79de1cb39> in <module>()
> ----> 1 df['Avg. Session Duration'] = df['Avg. Session Duration'].apply(lambda x: x.split(':'))
> 
> /Users/yumiyang/anaconda/lib/python2.7/site-packages/pandas/core/series.pyc
> in apply(self, func, convert_dtype, args, **kwds)    1991            
> values = lib.map_infer(values, lib.Timestamp)    1992 
> -> 1993         mapped = lib.map_infer(values, f, convert=convert_dtype)    1994         if len(mapped) and
> isinstance(mapped[0], Series):    1995             from
> pandas.core.frame import DataFrame
> 
> /Users/yumiyang/anaconda/lib/python2.7/site-packages/pandas/lib.so in
> pandas.lib.map_infer (pandas/lib.c:52281)()
> 
> <ipython-input-53-01e79de1cb39> in <lambda>(x)
> ----> 1 df['Avg. Session Duration'] = df['Avg. Session Duration'].apply(lambda x: x.split(':'))
> 
> AttributeError: 'float' object has no attribute 'split'

I don't know why it says values of 'Avg. Session Duration' are float.

我不知道为什么它说 'Avg. 会话持续时间'是浮动的。

Data columns (total 7 columns):
Source                   250 non-null object
Time                     251 non-null object
Sessions                 188 non-null object
Users                    188 non-null object
Bounce Rate              188 non-null object
Avg. Session Duration    188 non-null object
% New Sessions           188 non-null object
dtypes: object(7)

Can someone help me figure out where the problem is?

有人可以帮我找出问题所在吗?

采纳答案by jfs

The error means that the column is recognized as float, not string. Fix the way you read the data e.g.:

该错误意味着该列被识别为浮点数,而不是字符串。修复您读取数据的方式,例如:

#!/usr/bin/env python
import sys
import pandas

def hh_mm_ss2seconds(hh_mm_ss):
    return reduce(lambda acc, x: acc*60 + x, map(int, hh_mm_ss.split(':')))

df = pandas.read_csv('input.csv', sep=r'\s{2,}',
                     converters={'Avg. Session Duration': hh_mm_ss2seconds})
print(df)

Output

输出

    Source   Month  Sessions Bounce Rate  Avg. Session Duration
0  ABC.com  201501       408      26.47%                    206
1  EFG.com  201412       398      31.45%                    243

[2 rows x 5 columns]

回答by JAB

df['Avg. Session Duration']should be strings for your function to work.

df['Avg. Session Duration']应该是您的函数工作的字符串。

df =pd.DataFrame({'time':['0:03:26']})

def time_convert(x):
    h,m,s = map(int,x.split(':'))
    return (h*60+m)*60+s

df.time.apply(time_convert)

This works fine for me.

这对我来说很好用。

回答by Michael Kazarian

You can convert time to seconds with timeand datetimefrom standard python library:

您可以转换的时间用秒timedatetime从标准Python库:

import time, datetime
def convertTime(t):
    x = time.strptime(t,'%H:%M:%S')
    return str(int(datetime.timedelta(hours=x.tm_hour,minutes=x.tm_min,seconds=x.tm_sec).total_seconds()))

convertTime('0:03:26') # Output 206.0
convertTime('0:04:03') # Output 243.0