使用 Pandas 对时间序列进行分箱

Question

提问by displayname

I'm having a time series in form of a DataFramethat I can groupbyto a series

我有一个时间序列的形式DataFrame，我可以groupby对一个系列

pan.groupby(pan.Time).mean()

which has just two columns Timeand Value:

它只有两列Time和Value：

Time                Value
2015-04-24 06:38:49 0.023844
2015-04-24 06:39:19 0.019075
2015-04-24 06:43:49 0.023844
2015-04-24 06:44:18 0.019075
2015-04-24 06:44:48 0.023844
2015-04-24 06:45:18 0.019075
2015-04-24 06:47:48 0.023844
2015-04-24 06:48:18 0.019075
2015-04-24 06:50:48 0.023844
2015-04-24 06:51:18 0.019075
2015-04-24 06:51:48 0.023844
2015-04-24 06:52:18 0.019075
2015-04-24 06:52:48 0.023844
2015-04-24 06:53:48 0.019075
2015-04-24 06:55:18 0.023844
2015-04-24 07:00:47 0.019075
2015-04-24 07:01:17 0.023844
2015-04-24 07:01:47 0.019075

What I'm trying to do is figuring out how I can bin those values into a sampling rate of e.g. 30 seconds and average those bins with more than one observations.

我正在尝试做的是弄清楚如何将这些值分箱到例如 30 秒的采样率中，并用多个观察值对这些分箱进行平均。

In a last step I'd need to interpolate those values but I'm sure that there's something out there I can use.

在最后一步中，我需要对这些值进行插值，但我确信有些东西可以使用。

However, I just can't figure out how to do the binning and averaging of those values. Timeis a datetime.datetimeobject, not a str.

但是，我就是不知道如何对这些值进行分箱和平均。Time是一个datetime.datetime对象，而不是一个str。

I've tried different things but nothing works. Exceptions flying around.

我尝试了不同的东西，但没有任何效果。异常飞来飞去。

Somebody out there who got this?

有谁得到了这个？

Answer 1

回答by Nickil Maveli

IIUC, you could use TimeGrouperalong with groupbyon the index level to calculate the averages for the Valuecolumn as shown:

IIUC，您可以在索引级别使用TimeGrouperwithgroupby来计算Value列的平均值，如下所示：

df.set_index('Time', inplace=True)
# Taking mean values for a frequency of 2 minutes
df_group = df.groupby(pd.TimeGrouper(level='Time', freq='2T'))['Value'].agg('mean')   
df_group.dropna(inplace=True)
df_group = df_group.to_frame().reset_index()
print(df_group)

                 Time     Value
0 2015-04-24 06:38:00  0.021459
1 2015-04-24 06:42:00  0.023844
2 2015-04-24 06:44:00  0.020665
3 2015-04-24 06:46:00  0.023844
4 2015-04-24 06:48:00  0.019075
5 2015-04-24 06:50:00  0.022254
6 2015-04-24 06:52:00  0.020665
7 2015-04-24 06:54:00  0.023844
8 2015-04-24 07:00:00  0.020665

You could also use resampleas pointed out by @Paul H which is rather concise for this situation.

您也可以使用resample@Paul H 指出的那样，这对于这种情况非常简洁。

print(df.set_index('Time').resample('2T').mean().dropna().reset_index())

                 Time     Value
0 2015-04-24 06:38:00  0.021459
1 2015-04-24 06:42:00  0.023844
2 2015-04-24 06:44:00  0.020665
3 2015-04-24 06:46:00  0.023844
4 2015-04-24 06:48:00  0.019075
5 2015-04-24 06:50:00  0.022254
6 2015-04-24 06:52:00  0.020665
7 2015-04-24 06:54:00  0.023844
8 2015-04-24 07:00:00  0.020665

使用 Pandas 对时间序列进行分箱

提问by displayname

回答by Nickil Maveli

相关推荐

最近更新

标签

使用 Pandas 对时间序列进行分箱

提问by displayname

回答by Nickil Maveli

相关推荐

Python：Pandas 系列 - 为什么使用 loc？

Python Pandas 查找所有值为 NaN 的所有行

pandas 在数据框 Python 中创建累积频率列

pandas 舍入一列

相关推荐

最近更新

标签