使用 pandas.shift() 基于 scipy.signal.correlate 对齐数据集

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/19642443/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 21:17:30  来源:igfitidea点击:

Use of pandas.shift() to align datasets based on scipy.signal.correlate

pythonpandasscipy

提问by not link

I have datasets that look like the following: data0, data1, data2(analogous to time versus voltage data)

我有如下所示的数据集:data0data1data2(类似于时间与电压数据)

If I load and plot the datasets using code like:

如果我使用如下代码加载和绘制数据集:

import pandas as pd
import numpy as np
from scipy import signal
from matplotlib import pylab as plt

data0 = pd.read_csv('data0.csv')
data1 = pd.read_csv('data1.csv')
data2 = pd.read_csv('data2.csv')

plt.plot(data0.x, data0.y, data1.x, data1.y, data2.x, data2.y)

I get something like:

我得到类似的东西:

plotting all three datasets

绘制所有三个数据集

now I try to correlate data0 with data1:

现在我尝试将 data0 与 data1 相关联:

shft01 = np.argmax(signal.correlate(data0.y, data1.y)) - len(data1.y)
print shft01
plt.figure()
plt.plot(data0.x, data0.y,
         data1.x.shift(-shft01), data1.y)
fig = plt.gcf()

with output:

带输出:

-99

and

shifted version of data1

数据1的移位版本

which works just as expected! but if I try it the same thing with data2, I get a plot that looks like:

正如预期的那样工作!但是如果我用 data2 尝试同样的事情,我会得到一个看起来像的图:

shifted version of data2

数据2的移位版本

with a positive shift of 410. I think I am just not understanding how pd.shift()works, but I was hoping that I could use pd.shift()to align my data sets. As far as I understand, the return from correlate()tells me how far off my data sets are, so I should be able to use shift to overlap them.

的正偏移410。我想我只是不明白是如何pd.shift()工作的,但我希望我可以pd.shift()用来对齐我的数据集。据我了解, return fromcorrelate()告诉我我的数据集有多远,所以我应该能够使用 shift 来重叠它们。

回答by HYRY

panda.shift()is not the correct method to shift curve along x-axis. You should adjust X values of the points:

panda.shift()不是沿 x 轴移动曲线的正确方法。您应该调整点的 X 值:

plt.plot(data0.x, data0.y)
for target in [data1, data2]:
    dx = np.mean(np.diff(data0.x.values))
    shift = (np.argmax(signal.correlate(data0.y, target.y)) - len(target.y)) * dx
    plt.plot(target.x + shift, target.y)

here is the output:

这是输出:

enter image description here

在此处输入图片说明

回答by AhabTheArab

@HYRYone correction to your answer: there is an indexing mismatch between len(), which is one-based, and np.argmax(), which is zero-based. The line should read:

@HYRY对您的答案进行一次更正:len()基于 1 的 和基于np.argmax()零的之间存在索引不匹配。该行应为:

shift = (np.argmax(signal.correlate(data0.y, target.y)) - (len(target.y)-1)) * dx

shift = (np.argmax(signal.correlate(data0.y, target.y)) - (len(target.y)-1)) * dx

For example, in the case where your signals are already aligned:

例如,在您的信号已经对齐的情况下:

len(target.y)= N (one-based)

len(target.y)= N(从一开始)

The cross-correlation function has length 2N-1, so the center value, for aligned data, is:

互相关函数的长度为 2N-1,因此对齐数据的中心值为:

np.argmax(signal.correlate(data0.y, target.y)= N - 1 (zero-based)

np.argmax(signal.correlate(data0.y, target.y)= N - 1(从零开始)

shift = ((N-1) - N) * dx= (-1) * dx, when we really want 0 * dx

shift = ((N-1) - N) * dx= (-1) * dx,当我们真的想要 0 * dx