python中的加权移动平均
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/18517722/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Weighted moving average in python
提问by DanHickstein
I have data sampled at essentially random intervals. I would like to compute a weighted moving average using numpy (or other python package). I have a crude implementation of a moving average, but I am having trouble finding a good way to do a weighted moving average, so that the values towards the center of the bin are weighted more than values towards the edges.
我有基本上随机间隔采样的数据。我想使用 numpy(或其他 python 包)计算加权移动平均值。我有一个移动平均线的粗略实现,但我很难找到一种很好的方法来进行加权移动平均线,因此靠近 bin 中心的值比靠近边缘的值的权重更大。
Here I generate some sample data and then take a moving average. How can I most easily implement a weighted moving average? Thanks!
在这里,我生成了一些样本数据,然后取了移动平均线。如何最轻松地实现加权移动平均线?谢谢!
import numpy as np
import matplotlib.pyplot as plt
#first generate some datapoint for a randomly sampled noisy sinewave
x = np.random.random(1000)*10
noise = np.random.normal(scale=0.3,size=len(x))
y = np.sin(x) + noise
#plot the data
plt.plot(x,y,'ro',alpha=0.3,ms=4,label='data')
plt.xlabel('Time')
plt.ylabel('Intensity')
#define a moving average function
def moving_average(x,y,step_size=.1,bin_size=1):
bin_centers = np.arange(np.min(x),np.max(x)-0.5*step_size,step_size)+0.5*step_size
bin_avg = np.zeros(len(bin_centers))
for index in range(0,len(bin_centers)):
bin_center = bin_centers[index]
items_in_bin = y[(x>(bin_center-bin_size*0.5) ) & (x<(bin_center+bin_size*0.5))]
bin_avg[index] = np.mean(items_in_bin)
return bin_centers,bin_avg
#plot the moving average
bins, average = moving_average(x,y)
plt.plot(bins, average,label='moving average')
plt.show()
The output:
输出:
Using the advice from crs17 to use "weights=" in the np.average function, I came up weighted average function, which uses a Gaussian function to weight the data:
使用 crs17 的建议在 np.average 函数中使用“weights=”,我想出了加权平均函数,它使用高斯函数对数据进行加权:
def weighted_moving_average(x,y,step_size=0.05,width=1):
bin_centers = np.arange(np.min(x),np.max(x)-0.5*step_size,step_size)+0.5*step_size
bin_avg = np.zeros(len(bin_centers))
#We're going to weight with a Gaussian function
def gaussian(x,amp=1,mean=0,sigma=1):
return amp*np.exp(-(x-mean)**2/(2*sigma**2))
for index in range(0,len(bin_centers)):
bin_center = bin_centers[index]
weights = gaussian(x,mean=bin_center,sigma=width)
bin_avg[index] = np.average(y,weights=weights)
return (bin_centers,bin_avg)
Results look good:
结果看起来不错:
采纳答案by crs17
You could use numpy.averagewhich allows you to specify weights:
您可以使用numpy.average来指定权重:
>>> bin_avg[index] = np.average(items_in_bin, weights=my_weights)
So to calculate the weights you could find the x coordinates of each data point in the bin and calculate their distances to the bin center.
因此,要计算权重,您可以找到 bin 中每个数据点的 x 坐标并计算它们到 bin 中心的距离。
回答by Jaime
This won't give an exact solution, but it will make your life easier, and will probably be good enough... First, average your samples in small bins. Once you have resampled your data to be equispaced, you can use stride tricks and np.average
to do a weighted average:
这不会给出一个确切的解决方案,但它会让你的生活更轻松,而且可能已经足够好了......首先,在小箱中平均你的样本。重新采样数据以使其等距后,您可以使用步幅技巧并np.average
进行加权平均:
from numpy.lib.stride_tricks import as_strided
def moving_weighted_average(x, y, step_size=.1, steps_per_bin=10,
weights=None):
# This ensures that all samples are within a bin
number_of_bins = int(np.ceil(np.ptp(x) / step_size))
bins = np.linspace(np.min(x), np.min(x) + step_size*number_of_bins,
num=number_of_bins+1)
bins -= (bins[-1] - np.max(x)) / 2
bin_centers = bins[:-steps_per_bin] + step_size*steps_per_bin/2
counts, _ = np.histogram(x, bins=bins)
vals, _ = np.histogram(x, bins=bins, weights=y)
bin_avgs = vals / counts
n = len(bin_avgs)
windowed_bin_avgs = as_strided(bin_avgs,
(n-steps_per_bin+1, steps_per_bin),
bin_avgs.strides*2)
weighted_average = np.average(windowed_bin_avgs, axis=1, weights=weights)
return bin_centers, weighted_average
You can now do something like this:
你现在可以做这样的事情:
#plot the moving average with triangular weights
weights = np.concatenate((np.arange(0, 5), np.arange(0, 5)[::-1]))
bins, average = moving_weighted_average(x, y, steps_per_bin=len(weights),
weights=weights)
plt.plot(bins, average,label='moving average')
plt.show()