使用 Pandas 对时间序列进行分箱
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/39002122/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Binning time series with pandas
提问by displayname
I'm having a time series in form of a DataFrame
that I can groupby
to a series
我有一个时间序列的形式DataFrame
,我可以groupby
对一个系列
pan.groupby(pan.Time).mean()
which has just two columns Time
and Value
:
它只有两列Time
和Value
:
Time Value
2015-04-24 06:38:49 0.023844
2015-04-24 06:39:19 0.019075
2015-04-24 06:43:49 0.023844
2015-04-24 06:44:18 0.019075
2015-04-24 06:44:48 0.023844
2015-04-24 06:45:18 0.019075
2015-04-24 06:47:48 0.023844
2015-04-24 06:48:18 0.019075
2015-04-24 06:50:48 0.023844
2015-04-24 06:51:18 0.019075
2015-04-24 06:51:48 0.023844
2015-04-24 06:52:18 0.019075
2015-04-24 06:52:48 0.023844
2015-04-24 06:53:48 0.019075
2015-04-24 06:55:18 0.023844
2015-04-24 07:00:47 0.019075
2015-04-24 07:01:17 0.023844
2015-04-24 07:01:47 0.019075
What I'm trying to do is figuring out how I can bin those values into a sampling rate of e.g. 30 seconds and average those bins with more than one observations.
我正在尝试做的是弄清楚如何将这些值分箱到例如 30 秒的采样率中,并用多个观察值对这些分箱进行平均。
In a last step I'd need to interpolate those values but I'm sure that there's something out there I can use.
在最后一步中,我需要对这些值进行插值,但我确信有些东西可以使用。
However, I just can't figure out how to do the binning and averaging of those values. Time
is a datetime.datetime
object, not a str
.
但是,我就是不知道如何对这些值进行分箱和平均。Time
是一个datetime.datetime
对象,而不是一个str
。
I've tried different things but nothing works. Exceptions flying around.
我尝试了不同的东西,但没有任何效果。异常飞来飞去。
Somebody out there who got this?
有谁得到了这个?
回答by Nickil Maveli
IIUC, you could use TimeGrouper
along with groupby
on the index level to calculate the averages for the Value
column as shown:
IIUC,您可以在索引级别使用TimeGrouper
withgroupby
来计算Value
列的平均值,如下所示:
df.set_index('Time', inplace=True)
# Taking mean values for a frequency of 2 minutes
df_group = df.groupby(pd.TimeGrouper(level='Time', freq='2T'))['Value'].agg('mean')
df_group.dropna(inplace=True)
df_group = df_group.to_frame().reset_index()
print(df_group)
Time Value
0 2015-04-24 06:38:00 0.021459
1 2015-04-24 06:42:00 0.023844
2 2015-04-24 06:44:00 0.020665
3 2015-04-24 06:46:00 0.023844
4 2015-04-24 06:48:00 0.019075
5 2015-04-24 06:50:00 0.022254
6 2015-04-24 06:52:00 0.020665
7 2015-04-24 06:54:00 0.023844
8 2015-04-24 07:00:00 0.020665
You could also use resample
as pointed out by @Paul H which is rather concise for this situation.
您也可以使用resample
@Paul H 指出的那样,这对于这种情况非常简洁。
print(df.set_index('Time').resample('2T').mean().dropna().reset_index())
Time Value
0 2015-04-24 06:38:00 0.021459
1 2015-04-24 06:42:00 0.023844
2 2015-04-24 06:44:00 0.020665
3 2015-04-24 06:46:00 0.023844
4 2015-04-24 06:48:00 0.019075
5 2015-04-24 06:50:00 0.022254
6 2015-04-24 06:52:00 0.020665
7 2015-04-24 06:54:00 0.023844
8 2015-04-24 07:00:00 0.020665