计算 DataFrame Pandas 中“时间”行之间的差异

Question

提问by Pragnya Srinivasan

My DataFrame is in the Form:

我的数据帧采用以下形式：

       TimeWeek   TimeSat  TimeHoli
0      6:40:00   8:00:00   8:00:00
1      6:45:00   8:05:00   8:05:00
2      6:50:00   8:09:00   8:10:00
3      6:55:00   8:11:00   8:14:00
4      6:58:00   8:13:00   8:17:00
5      7:40:00   8:15:00   8:21:00

I need to find the time difference between each row in TimeWeek , TimeSat and TimeHoli, the output must be

我需要找到 TimeWeek 、 TimeSat 和 TimeHoli 中每一行之间的时差，输出必须是

TimeWeekDiff   TimeSatDiff  TimeHoliDiff
00:05:00          00:05:00       00:05:00
00:05:00          00:04:00       00:05:00
00:05:00          00:02:00       00:04:00  
00:03:00          00:02:00       00:03:00
00:02:00          00:02:00       00:04:00

I tried using (d['TimeWeek']-df['TimeWeek'].shift().fillna(0), it throws an error:

我尝试使用(d['TimeWeek']-df['TimeWeek'].shift().fillna(0)，它抛出一个错误：

TypeError: unsupported operand type(s) for -: 'str' and 'str'

Probably because of the presence of ':' in the column. How do I resolve this?

可能是因为列中存在“:”。我该如何解决？

Answer 1

回答by Alexander

It looks like the error is thrown because the data is in the form of a string instead of a timestamp. First convert them to timestamps:

看起来抛出错误是因为数据是字符串形式而不是时间戳。首先将它们转换为时间戳：

df2 = df.apply(lambda x: [pd.Timestamp(ts) for ts in x])

They will contain today's date by default, but this shouldn't matter once you difference the time (hopefully you don't have to worry about differencing 23:55 and 00:05 across dates).

默认情况下，它们将包含今天的日期，但是一旦您区分时间，这应该无关紧要（希望您不必担心跨日期区分 23:55 和 00:05）。

Once converted, simply difference the DataFrame:

转换后，只需区分 DataFrame：

>>> df2 - df2.shift()
   TimeWeek  TimeSat  TimeHoli
0       NaT      NaT       NaT
1  00:05:00 00:05:00  00:05:00
2  00:05:00 00:04:00  00:05:00
3  00:05:00 00:02:00  00:04:00
4  00:03:00 00:02:00  00:03:00
5  00:42:00 00:02:00  00:04:00

Depending on your needs, you can just take rows 1+ (ignoring the NaTs):

根据您的需要，您可以只取第 1+ 行（忽略 NaT）：

(df2 - df2.shift()).iloc[1:, :]

or you can fill the NaTs with zeros:

或者您可以用零填充 NaT：

(df2 - df2.shift()).fillna(0)

Answer 2

回答by jwilner

Forget everything I just said. Pandas has great timedelta parsing.

忘记我刚才说的一切。Pandas 有很好的 timedelta 解析。

df["TimeWeek"] = pd.to_timedelta(df["TimeWeek"])
(d['TimeWeek']-df['TimeWeek'].shift().fillna(pd.to_timedelta("00:00:00"))

Answer 3

回答by S.Sreeram

>>> import pandas as pd
>>> df = pd.DataFrame({'TimeWeek': ['6:40:00', '6:45:00', '6:50:00', '6:55:00', '7:40:00']})
>>> df["TimeWeek_date"] = pd.to_datetime(df["TimeWeek"], format="%H:%M:%S")
>>> print df
  TimeWeek       TimeWeek_date
0  6:40:00 1900-01-01 06:40:00
1  6:45:00 1900-01-01 06:45:00
2  6:50:00 1900-01-01 06:50:00
3  6:55:00 1900-01-01 06:55:00
4  7:40:00 1900-01-01 07:40:00
>>> df['TimeWeekDiff'] = (df['TimeWeek_date'] - df['TimeWeek_date'].shift().fillna(pd.to_datetime("00:00:00", format="%H:%M:%S")))
>>> print df
  TimeWeek       TimeWeek_date  TimeWeekDiff
0  6:40:00 1900-01-01 06:40:00      06:40:00
1  6:45:00 1900-01-01 06:45:00      00:05:00
2  6:50:00 1900-01-01 06:50:00      00:05:00
3  6:55:00 1900-01-01 06:55:00      00:05:00
4  7:40:00 1900-01-01 07:40:00      00:45:00

计算 DataFrame Pandas 中“时间”行之间的差异

提问by Pragnya Srinivasan

回答by Alexander

回答by jwilner

回答by S.Sreeram

相关推荐

最近更新

标签

计算 DataFrame Pandas 中“时间”行之间的差异

提问by Pragnya Srinivasan

回答by Alexander

回答by jwilner

回答by S.Sreeram

相关推荐

Pandas：按行号和列号访问数据框中的数据

如何将选定的列从具有不同列的 df 附加到 Pandas 数据帧

pandas Python：未正确调用 DataFrame 构造函数

pandas 堆叠两个熊猫数据框

相关推荐

最近更新

标签