pandas 如何对数据框中的时间求和

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/38229357/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:31:51  来源:igfitidea点击:

How to sum time in a dataframe

pythondatetimepandas

提问by LMLPP

I have a dataframe of time data in the format

我有一个格式为时间数据的数据框

hh:mm:ss
hh:mm:ss

(type string)

(输入字符串)

I need to be able to sum the values (to acquire total time) in a few of the columns. I'm wondering if anyone has any recommendations on the best way to do this and get the sum in the same format.

我需要能够对几列中的值求和(以获取总时间)。我想知道是否有人对执行此操作并以相同格式获得总和的最佳方法有任何建议。

回答by Jeff

You can do this using timedelta:

您可以使用 timedelta 执行此操作:

import pandas as pd
import datetime

data = {'t1':['01:15:31', 
              '00:47:15'], 
        't2':['01:13:02', 
              '00:51:33']
        }

def make_delta(entry):
    h, m, s = entry.split(':')
    return datetime.timedelta(hours=int(h), minutes=int(m), seconds=int(s))

df = pd.DataFrame(data)
df = df.applymap(lambda entry: make_delta(entry))

df['elapsed'] = df['t1'] + df['t2']

In [23]: df
Out[23]:
        t1       t2  elapsed
0 01:15:31 01:13:02 02:28:33
1 00:47:15 00:51:33 01:38:48

Edit: I see you need to do this by column, not row. In that case do the same thing, but:

编辑:我看到您需要按列执行此操作,而不是按行执行此操作。在这种情况下,做同样的事情,但是:

In [24]: df['t1'].sum()
Out[24]: Timedelta('0 days 02:02:46')

回答by jezrael

You can use to_timedeltawith sum:

你可以用to_timedeltasum

import pandas as pd

df = pd.DataFrame({'A': ['18:22:28', '12:15:10']})

df['A'] = pd.to_timedelta(df.A)

print (df)
         A
0 18:22:28
1 12:15:10

print (df.dtypes)
A    timedelta64[ns]
dtype: object

print (df.A.sum())
1 days 06:37:38

回答by Alyssa Haroldsen

Maybe try using datetime.timedelta?

也许尝试使用datetime.timedelta

import re
from datetime import timedelta

_TIME_RE = re.compile(r'(\d+):(\d+):(\d+)')

def parse_timedelta(line):
    # Invalid lines (such as blank) will be considered 0 seconds
    m = _TIME_RE.match(line)
    if m is None:
        return timedelta()
    hours, minutes, seconds = [int(i) for i in m.groups()]
    return timedelta(hours=hours, minutes=minutes, seconds=seconds)

def format_timedelta(delta):
    hours, rem = divmod(delta.seconds + delta.days * 86400, 3600)
    minutes, seconds = divmod(rem, 60)
    return '{:02}:{:02}:{:02}'.format(hours, minutes, seconds)

If datais a list containing the lines:

如果data是包含以下行的列表:

print(format_timedelta(sum(parse_timedelta(line) for line in data)))