pandas 从 Python 中的信号中删除尖峰
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/37556487/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Remove spikes from signal in Python
提问by wiedzminYo
I have a signal from respiration recording with lot of spikes due yawns for example. I have tried to remove it using rolling mean function from pandas but it didnt help. Green space on this graph is result of using rolling mean.
我有一个来自呼吸记录的信号,例如因打哈欠而出现的大量尖峰。我曾尝试使用大Pandas的滚动均值函数将其删除,但没有帮助。此图上的绿色空间是使用滚动平均值的结果。
import pandas as pd
RESP=pd.DataFrame(RESP)
RESP_AV=pd.rolling_mean(RESP,50)
I don't know much about filtering data and I couldn't find any other ways in pandas to remove this spikes so my question is where to look for answer. Result of RESP.head() is:
我对过滤数据不太了解,我在Pandas中找不到任何其他方法来消除这些尖峰,所以我的问题是在哪里寻找答案。RESP.head() 的结果是:
0 -2562.863389
1 -2035.020403
2 -2425.538355
3 -2554.280563
4 -2242.438367
6.7636961937
采纳答案by AndreiS
The following function will remove highest spike from an array yi and replace the spike area with parabola:
以下函数将从数组 yi 中移除最高尖峰并用抛物线替换尖峰区域:
def despike(yi,th=1.e-8):
'''Remove spike from array yi, the spike area is where the difference between
the neigboring points is higher than th.'''
y = np.copy(yi) # use y = y1 if it is OK to modify input array
n = len(y)
x = np.arange(n)
c = np.argmax(y)
d = abs(np.diff(y))
try:
l = c - 1 - np.where(d[c-1::-1]<th)[0][0]
r = c + np.where(d[c:]<th)[0][0] + 1
except: # no spike, return unaltered array
return y
# for fit, use area twice wider then the spike
if (r-l) <= 3:
l -= 1
r += 1
s = int(round((r-l)/2.))
lx = l - s
rx = r + s
# make a gap at spike area
xgapped = np.concatenate((x[lx:l],x[r:rx]))
ygapped = np.concatenate((y[lx:l],y[r:rx]))
# quadratic fit of the gapped array
z = np.polyfit(xgapped,ygapped,2)
p = np.poly1d(z)
y[l:r] = p(x[l:r])
return y
To remove many spikes: find the position oh the highest spike, apply this function to the narrow area around the spike, repeat.
要去除许多尖峰:找到最高尖峰的位置,将此功能应用于尖峰周围的狭窄区域,重复。
回答by xvan
I know of two ways to deal with this:
我知道有两种方法可以解决这个问题:
Design a better filter:
设计一个更好的过滤器:
1) Determine your signal band:
1) 确定您的信号频段:
Compare an spectrogram of your signal with your time signal, compare the non spike segments with the spike segments, to determine the max useful frequency (cutoff frequency) and the minimum spike manifestation (stop frequency)
将您的信号的频谱图与您的时间信号进行比较,将非尖峰段与尖峰段进行比较,以确定最大有用频率(截止频率)和最小尖峰表现(停止频率)
2) Design a LowPass filter: If you have matlab, use fdatool, if you want to use python, use remez
2)设计一个低通滤波器:如果你有matlab,使用fdatool,如果你想使用python,使用remez
3) Use that custom LowPass filter instead of rolling mean,
3)使用自定义低通滤波器而不是滚动平均值,
if you don't like the result, redesign the filter (band weight and windows size)
如果您不喜欢结果,请重新设计过滤器(频带重量和窗口大小)
detection + substitution:
检测+替代:
1) Remove the mean of the signal.
1) 去除信号的均值。
2) Use a differentiator filter and a threshold to detect the peaks.
2) 使用微分滤波器和阈值来检测峰值。
3) Cut all the peaks out of the signal (replace them by 0's)
3)从信号中切出所有峰值(用0替换它们)
4) Optional Filter the peak out of the cutted segment (see method above)
4)可选过滤掉切割段的峰值(见上面的方法)
5) For each cutted peak, find the maximum crosscorrelation coefficent between the cutted segment and the signal without peaks, replace the segment and make a fade in/out effect to smooth the pasting.
5) 对每个切出的峰,找出切出的段与没有峰的信号之间的最大互相关系数,替换段并做淡入淡出效果使粘贴平滑。