Pandas Python - 将 HH:MM:SS 转换为聚合中的秒数（csv 文件）

Question

提问by Yumi

I'm trying to convert the numbers in 'Avg. Session Duration'(HH:MM:SS) column into whole numbers (in seconds) in Pandas read_csvmodule/function. For instance, '0:03:26' would be 206 seconds after the conversion.

我正在尝试将 'Avg. 会话持续时间'(HH:MM:SS) 列转换为 Pandasread_csv模块/函数中的整数（以秒为单位）。例如，'0:03:26' 将是转换后的 206 秒。

Input example:

输入示例：

Source       Month  Sessions    Bounce Rate     Avg. Session Duration   
ABC.com     201501   408        26.47%           0:03:26 
EFG.com     201412   398        31.45%           0:04:03

I wrote a function:

我写了一个函数：

def time_convert(x):
    times = x.split(':')
    return (60*int(times[0])+60*int(times[1]))+int(times[2])

This function works just fine while simply passing '0:03:26' to the function. But when I was trying to create a new column 'Duration' by applying the function to another column in Pandas,

这个函数工作得很好，只需将 '0:03:26' 传递给函数。但是当我试图通过将函数应用于 Pandas 中的另一列来创建一个新列“持续时间”时，

df = pd.read_csv('myfile.csv')
df['Duration'] = df['Avg. Session Duration'].apply(time_convert)

It returned an Error Message:

它返回了一条错误消息：

> --------------------------------------------------------------------------- AttributeError                            Traceback (most recent call
> last) <ipython-input-53-01e79de1cb39> in <module>()
> ----> 1 df['Avg. Session Duration'] = df['Avg. Session Duration'].apply(lambda x: x.split(':'))
> 
> /Users/yumiyang/anaconda/lib/python2.7/site-packages/pandas/core/series.pyc
> in apply(self, func, convert_dtype, args, **kwds)    1991            
> values = lib.map_infer(values, lib.Timestamp)    1992 
> -> 1993         mapped = lib.map_infer(values, f, convert=convert_dtype)    1994         if len(mapped) and
> isinstance(mapped[0], Series):    1995             from
> pandas.core.frame import DataFrame
> 
> /Users/yumiyang/anaconda/lib/python2.7/site-packages/pandas/lib.so in
> pandas.lib.map_infer (pandas/lib.c:52281)()
> 
> <ipython-input-53-01e79de1cb39> in <lambda>(x)
> ----> 1 df['Avg. Session Duration'] = df['Avg. Session Duration'].apply(lambda x: x.split(':'))
> 
> AttributeError: 'float' object has no attribute 'split'

I don't know why it says values of 'Avg. Session Duration' are float.

我不知道为什么它说 'Avg. 会话持续时间'是浮动的。

Data columns (total 7 columns):
Source                   250 non-null object
Time                     251 non-null object
Sessions                 188 non-null object
Users                    188 non-null object
Bounce Rate              188 non-null object
Avg. Session Duration    188 non-null object
% New Sessions           188 non-null object
dtypes: object(7)

Can someone help me figure out where the problem is?

有人可以帮我找出问题所在吗？

Answer 1

采纳答案by jfs

The error means that the column is recognized as float, not string. Fix the way you read the data e.g.:

该错误意味着该列被识别为浮点数，而不是字符串。修复您读取数据的方式，例如：

#!/usr/bin/env python
import sys
import pandas

def hh_mm_ss2seconds(hh_mm_ss):
    return reduce(lambda acc, x: acc*60 + x, map(int, hh_mm_ss.split(':')))

df = pandas.read_csv('input.csv', sep=r'\s{2,}',
                     converters={'Avg. Session Duration': hh_mm_ss2seconds})
print(df)

Output

输出

    Source   Month  Sessions Bounce Rate  Avg. Session Duration
0  ABC.com  201501       408      26.47%                    206
1  EFG.com  201412       398      31.45%                    243

[2 rows x 5 columns]

Answer 2

回答by JAB

df['Avg. Session Duration']should be strings for your function to work.

df['Avg. Session Duration']应该是您的函数工作的字符串。

df =pd.DataFrame({'time':['0:03:26']})

def time_convert(x):
    h,m,s = map(int,x.split(':'))
    return (h*60+m)*60+s

df.time.apply(time_convert)

This works fine for me.

这对我来说很好用。

Answer 3

回答by Michael Kazarian

You can convert time to seconds with timeand datetimefrom standard python library:

您可以转换的时间用秒time和datetime从标准Python库：

import time, datetime
def convertTime(t):
    x = time.strptime(t,'%H:%M:%S')
    return str(int(datetime.timedelta(hours=x.tm_hour,minutes=x.tm_min,seconds=x.tm_sec).total_seconds()))

convertTime('0:03:26') # Output 206.0
convertTime('0:04:03') # Output 243.0

Pandas Python - 将 HH:MM:SS 转换为聚合中的秒数（csv 文件）

提问by Yumi

采纳答案by jfs

Output

输出

回答by JAB

回答by Michael Kazarian

相关推荐

最近更新

标签

Pandas Python - 将 HH:MM:SS 转换为聚合中的秒数（csv 文件）

提问by Yumi

采纳答案by jfs

Output

输出

回答by JAB

回答by Michael Kazarian

相关推荐

Python Pandas：如何确定数据集的分布？

如何在 Windows 中为 Python 3.4 安装 Pandas？

在 Pandas 中断言列数据类型

尽管更改了 pyCharm python 解释器路径，但无法将 Pandas 导入 pycharm 解释器

相关推荐

最近更新

标签