使用 Pandas 对时间序列进行分箱

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/39002122/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:50:25  来源:igfitidea点击:

Binning time series with pandas

pythonpandas

提问by displayname

I'm having a time series in form of a DataFramethat I can groupbyto a series

我有一个时间序列的形式DataFrame,我可以groupby对一个系列

pan.groupby(pan.Time).mean()

which has just two columns Timeand Value:

它只有两列TimeValue

Time                Value
2015-04-24 06:38:49 0.023844
2015-04-24 06:39:19 0.019075
2015-04-24 06:43:49 0.023844
2015-04-24 06:44:18 0.019075
2015-04-24 06:44:48 0.023844
2015-04-24 06:45:18 0.019075
2015-04-24 06:47:48 0.023844
2015-04-24 06:48:18 0.019075
2015-04-24 06:50:48 0.023844
2015-04-24 06:51:18 0.019075
2015-04-24 06:51:48 0.023844
2015-04-24 06:52:18 0.019075
2015-04-24 06:52:48 0.023844
2015-04-24 06:53:48 0.019075
2015-04-24 06:55:18 0.023844
2015-04-24 07:00:47 0.019075
2015-04-24 07:01:17 0.023844
2015-04-24 07:01:47 0.019075

What I'm trying to do is figuring out how I can bin those values into a sampling rate of e.g. 30 seconds and average those bins with more than one observations.

我正在尝试做的是弄清楚如何将这些值分箱到例如 30 秒的采样率中,并用多个观察值对这些分箱进行平均。

In a last step I'd need to interpolate those values but I'm sure that there's something out there I can use.

在最后一步中,我需要对这些值进行插值,但我确信有些东西可以使用。

However, I just can't figure out how to do the binning and averaging of those values. Timeis a datetime.datetimeobject, not a str.

但是,我就是不知道如何对这些值进行分箱和平均。Time是一个datetime.datetime对象,而不是一个str

I've tried different things but nothing works. Exceptions flying around.

我尝试了不同的东西,但没有任何效果。异常飞来飞去。

Somebody out there who got this?

有谁得到了这个?

回答by Nickil Maveli

IIUC, you could use TimeGrouperalong with groupbyon the index level to calculate the averages for the Valuecolumn as shown:

IIUC,您可以在索引级别使用TimeGrouperwithgroupby来计算Value列的平均值,如下所示:

df.set_index('Time', inplace=True)
# Taking mean values for a frequency of 2 minutes
df_group = df.groupby(pd.TimeGrouper(level='Time', freq='2T'))['Value'].agg('mean')   
df_group.dropna(inplace=True)
df_group = df_group.to_frame().reset_index()
print(df_group)

                 Time     Value
0 2015-04-24 06:38:00  0.021459
1 2015-04-24 06:42:00  0.023844
2 2015-04-24 06:44:00  0.020665
3 2015-04-24 06:46:00  0.023844
4 2015-04-24 06:48:00  0.019075
5 2015-04-24 06:50:00  0.022254
6 2015-04-24 06:52:00  0.020665
7 2015-04-24 06:54:00  0.023844
8 2015-04-24 07:00:00  0.020665

You could also use resampleas pointed out by @Paul H which is rather concise for this situation.

您也可以使用resample@Paul H 指出的那样,这对于这种情况非常简洁。

print(df.set_index('Time').resample('2T').mean().dropna().reset_index())

                 Time     Value
0 2015-04-24 06:38:00  0.021459
1 2015-04-24 06:42:00  0.023844
2 2015-04-24 06:44:00  0.020665
3 2015-04-24 06:46:00  0.023844
4 2015-04-24 06:48:00  0.019075
5 2015-04-24 06:50:00  0.022254
6 2015-04-24 06:52:00  0.020665
7 2015-04-24 06:54:00  0.023844
8 2015-04-24 07:00:00  0.020665