python 重采样、插值矩阵
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1851384/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
resampling, interpolating matrix
提问by Phil
I'm trying to interpolate some data for the purpose of plotting. For instance, given N data points, I'd like to be able to generate a "smooth" plot, made up of 10*N or so interpolated data points.
我正在尝试插入一些数据以进行绘图。例如,给定 N 个数据点,我希望能够生成一个“平滑”图,由 10*N 个左右的内插数据点组成。
My approach is to generate an N-by-10*N matrix and compute the inner product the original vector and the matrix I generated, yielding a 1-by-10*N vector. I've already worked out the math I'd like to use for the interpolation, but my code is pretty slow. I'm pretty new to Python, so I'm hopeful that some of the experts here can give me some ideas of ways I can try to speed up my code.
我的方法是生成一个 N×10*N 矩阵并计算原始向量和我生成的矩阵的内积,产生一个 1×10*N 的向量。我已经计算出我想用于插值的数学,但我的代码很慢。我对 Python 还很陌生,所以我希望这里的一些专家能给我一些关于如何尝试加速我的代码的想法。
I think part of the problem is that generating the matrix requires 10*N^2 calls to the following function:
我认为部分问题在于生成矩阵需要对以下函数进行 10*N^2 次调用:
def sinc(x):
import math
try:
return math.sin(math.pi * x) / (math.pi * x)
except ZeroDivisionError:
return 1.0
(This comes from sampling theory. Essentially, I'm attempting to recreate a signal from its samples, and upsample it to a higher frequency.)
(这来自采样理论。本质上,我试图从它的样本中重新创建一个信号,并将其上采样到更高的频率。)
The matrix is generated by the following:
矩阵由以下生成:
def resampleMatrix(Tso, Tsf, o, f):
from numpy import array as npar
retval = []
for i in range(f):
retval.append([sinc((Tsf*i - Tso*j)/Tso) for j in range(o)])
return npar(retval)
I'm considering breaking up the task into smaller pieces because I don't like the idea of an N^2 matrix sitting in memory. I could probably make 'resampleMatrix' into a generator function and do the inner product row-by-row, but I don't think that will speed up my code much until I start paging stuff in and out of memory.
我正在考虑将任务分解成更小的部分,因为我不喜欢 N^2 矩阵位于内存中的想法。我可能可以将 'resampleMatrix' 变成一个生成器函数并逐行进行内积,但我认为在我开始将内容分页进出内存之前,这不会大大加快我的代码速度。
Thanks in advance for your suggestions!
预先感谢您的建议!
采纳答案by endolith
This is upsampling. See Help with resampling/upsamplingfor some example solutions.
这是上采样。有关一些示例解决方案,请参阅重采样/上采样帮助。
A fast way to do this (for offline data, like your plotting application) is to use FFTs. This is what SciPy's native resample()
functiondoes. It assumes a periodic signal, though, so it's not exactly the same. See this reference:
执行此操作的一种快速方法(对于离线数据,例如绘图应用程序)是使用 FFT。这就是 SciPy 的原生resample()
函数所做的。但是,它假设一个周期信号,所以它并不完全相同。请参阅此参考:
Here's the second issue regarding time-domain real signal interpolation, and it's a big deal indeed. This exact interpolation algorithm provides correct results only if the original x(n) sequence is periodic within its full time inter-val.
这是关于时域实信号插值的第二个问题,这确实是一个大问题。只有当原始 x(n) 序列在其完整时间间隔内是周期性的时,这种精确的内插算法才能提供正确的结果。
Your function assumes the signal's samples are all 0 outside of the defined range, so the two methods will diverge away from the center point. If you pad the signal with lots of zeros first, it will produce a very close result. There are several more zeros past the edge of the plot not shown here:
您的函数假定信号的样本在定义的范围外均为 0,因此这两种方法将偏离中心点。如果你先用很多零填充信号,它会产生一个非常接近的结果。图中未显示的边缘还有几个零:
Cubic interpolation won't be correct for resampling purposes. This example is an extreme case (near the sampling frequency), but as you can see, cubic interpolation isn't even close. For lower frequencies it should be pretty accurate.
三次插值对于重采样目的是不正确的。这个例子是一个极端情况(接近采样频率),但正如你所看到的,三次插值甚至不接近。对于较低的频率,它应该非常准确。
回答by Eric O Lebigot
If you want to interpolate data in a quite general and fast way, splines or polynomials are very useful. Scipy has the scipy.interpolate module, which is very useful. You can find many examplesin the official pages.
如果您想以一种非常通用且快速的方式对数据进行插值,样条或多项式非常有用。Scipy 有 scipy.interpolate 模块,非常有用。您可以在官方页面中找到许多示例。
回答by denis
Here's a minimal example of 1d interpolation with scipy -- not as much fun as reinventing, but.
The plot looks like sinc
, which is no coincidence:
try google spline resample "approximate sinc".
(Presumably less local / more taps ⇒ better approximation,
but I have no idea how local UnivariateSplines are.)
这是一个使用 scipy 进行一维插值的最小示例——没有重新发明那么有趣,但是。
情节看起来像sinc
,这并非巧合:尝试 google spline resample "approximate sinc"。
(大概是更少的本地/更多的点击⇒更好的近似,但我不知道本地 UnivariateSplines 是如何。)
""" interpolate with scipy.interpolate.UnivariateSpline """
from __future__ import division
import numpy as np
from scipy.interpolate import UnivariateSpline
import pylab as pl
N = 10
H = 8
x = np.arange(N+1)
xup = np.arange( 0, N, 1/H )
y = np.zeros(N+1); y[N//2] = 100
interpolator = UnivariateSpline( x, y, k=3, s=0 ) # s=0 interpolates
yup = interpolator( xup )
np.set_printoptions( 1, threshold=100, suppress=True ) # .1f
print "yup:", yup
pl.plot( x, y, "green", xup, yup, "blue" )
pl.show()
Added feb 2010: see also basic-spline-interpolation-in-a-few-lines-of-numpy
2010 年 2 月添加:另请参见basic-spline-interpolation-in-a-few-lines-of-numpy
回答by AFoglia
I'm not quite sure what you're trying to do, but there are some speedups you can do to create the matrix. Braincore's suggestionto use numpy.sinc
is a first step, but the second is to realize that numpy functions want to work on numpy arrays, where they can do loops at C speen, and can do it faster than on individual elements.
我不太确定你想要做什么,但是你可以做一些加速来创建矩阵。 Braincore 的使用建议numpy.sinc
是第一步,但第二步是意识到 numpy 函数想要在 numpy 数组上工作,在那里它们可以在 C speen 上执行循环,并且可以比单个元素更快地完成。
def resampleMatrix(Tso, Tsf, o, f):
retval = numpy.sinc((Tsi*numpy.arange(i)[:,numpy.newaxis]
-Tso*numpy.arange(j)[numpy.newaxis,:])/Tso)
return retval
The trick is that by indexing the aranges with the numpy.newaxis, numpy converts the array with shape i to one with shape i x 1, and the array with shape j, to shape 1 x j. At the subtraction step, numpy will "broadcast" the each input to act as a i x j shaped array and the do the subtraction. ("Broadcast" is numpy's term, reflecting the fact no additional copy is made to stretch the i x 1 to i x j.)
诀窍在于,通过使用 numpy.newaxis 对 aranges 进行索引,numpy 将形状为 i 的数组转换为形状为 ix 1 的数组,将形状为 j 的数组转换为 1 x j 的形状。在减法步骤中,numpy 将“广播”每个输入以充当 aixj 形状的数组并进行减法。(“广播”是 numpy 的术语,反映了没有额外的副本将 ix 1 拉伸到 ix j 的事实。)
Now the numpy.sinc can iterate over all the elements in compiled code, much quicker than any for-loop you could write.
现在 numpy.sinc 可以遍历已编译代码中的所有元素,比您可以编写的任何 for 循环快得多。
(There's an additional speed-up available if you do the division before the subtraction, especially since inthe latter the division cancels the multiplication.)
(如果您在减法之前进行除法,则可以获得额外的加速,特别是因为在减法中除法取消了乘法。)
The only drawback is that you now pay for an extra Nx10*N array to hold the difference. This might be a dealbreaker if N is large and memory is an issue.
唯一的缺点是您现在需要支付额外的 Nx10*N 阵列来保持差异。如果 N 很大并且内存是一个问题,这可能是一个交易破坏者。
Otherwise, you should be able to write this using numpy.convolve
. From what little I just learned about sinc-interpolation, I'd say you want something like numpy.convolve(orig,numpy.sinc(numpy.arange(j)),mode="same")
. But I'm probably wrong about the specifics.
否则,您应该能够使用numpy.convolve
. 从我刚刚学到的关于 sinc 插值的一点知识,我会说你想要像numpy.convolve(orig,numpy.sinc(numpy.arange(j)),mode="same")
. 但我可能对细节有误解。
回答by Don O'Donnell
If your only interest is to 'generate a "smooth" plot' I would just go with a simple polynomial spline curve fit:
如果您唯一的兴趣是“生成“平滑”图,我会使用简单的多项式样条曲线拟合:
For any two adjacent data points the coefficients of a third degree polynomial function can be computed from the coordinates of those data points and the two additional points to their left and right (disregarding boundary points.) This will generate points on a nice smooth curve with a continuous first dirivitive. There's a straight forward formula for converting 4 coordinates to 4 polynomial coefficients but I don't want to deprive you of the fun of looking it up ;o).
对于任何两个相邻的数据点,三次多项式函数的系数可以从这些数据点的坐标和它们左右两个附加点(不考虑边界点)的坐标中计算出来。这将在一个很好的平滑曲线上生成点连续的第一导数。有一个简单的公式可以将 4 个坐标转换为 4 个多项式系数,但我不想剥夺您查找它的乐趣;o)。
回答by taleinat
Your question isn't entirely clear; you're trying to optimize the code you posted, right?
你的问题不完全清楚;您正在尝试优化您发布的代码,对吗?
Re-writing sinc like this should speed it up considerably. This implementation avoids checking that the math module is imported on every call, doesn't do attribute access three times, and replaces exception handling with a conditional expression:
像这样重写 sinc 应该会大大加快速度。这个实现避免了在每次调用时检查 math 模块是否被导入,不进行三次属性访问,并用条件表达式替换异常处理:
from math import sin, pi
def sinc(x):
return (sin(pi * x) / (pi * x)) if x != 0 else 1.0
You could also try avoiding creating the matrix twice (and holding it twice in parallel in memory) by creating a numpy.array directly (not from a list of lists):
您还可以尝试通过直接创建 numpy.array (而不是从列表列表中)来避免创建矩阵两次(并在内存中并行保存两次):
def resampleMatrix(Tso, Tsf, o, f):
retval = numpy.zeros((f, o))
for i in xrange(f):
for j in xrange(o):
retval[i][j] = sinc((Tsf*i - Tso*j)/Tso)
return retval
(replace xrange with range on Python 3.0 and above)
(在 Python 3.0 及更高版本上用 range 替换 xrange)
Finally, you can create rows with numpy.arange as well as calling numpy.sinc on each row or even on the entire matrix:
最后,您可以使用 numpy.arange 创建行,并在每一行甚至整个矩阵上调用 numpy.sinc :
def resampleMatrix(Tso, Tsf, o, f):
retval = numpy.zeros((f, o))
for i in xrange(f):
retval[i] = numpy.arange(Tsf*i / Tso, Tsf*i / Tso - o, -1.0)
return numpy.sinc(retval)
This should be significantly faster than your original implementation. Try different combinations of these ideas and test their performance, see which works out the best!
这应该比您的原始实现快得多。尝试这些想法的不同组合并测试它们的性能,看看哪个效果最好!
回答by BrainCore
Small improvement. Use the built-in numpy.sinc(x) function which runs in compiled C code.
小改进。使用在编译的 C 代码中运行的内置 numpy.sinc(x) 函数。
Possible larger improvement: Can you do the interpolation on the fly (as the plotting occurs)? Or are you tied to a plotting library that only accepts a matrix?
可能的更大改进:您可以即时进行插值(在绘图发生时)吗?或者您是否绑定到仅接受矩阵的绘图库?
回答by Escualo
I recommend that you check your algorithm, as it is a non-trivial problem. Specifically, I suggest you gain access to the article "Function Plotting Using Conic Splines" (IEEE Computer Graphics and Applications) by Hu and Pavlidis (1991). Their algorithm implementation allows for adaptive sampling of the function, such that the rendering time is smaller than with regularly spaced approaches.
我建议你检查你的算法,因为它是一个重要的问题。具体来说,我建议您访问 Hu 和 Pavlidis (1991) 的文章“使用圆锥样条进行函数绘图”(IEEE 计算机图形和应用程序)。他们的算法实现允许对函数进行自适应采样,这样渲染时间比规则间隔的方法要短。
The abstract follows:
摘要如下:
A method is presented whereby, given a mathematical description of a function, a conic spline approximating the plot of the function is produced. Conic arcs were selected as the primitive curves because there are simple incremental plotting algorithms for conics already included in some device drivers, and there are simple algorithms for local approximations by conics. A split-and-merge algorithm for choosing the knots adaptively, according to shape analysis of the original function based on its first-order derivatives, is introduced.
提出了一种方法,在给定函数的数学描述的情况下,可以生成近似函数图的圆锥样条。选择圆锥弧作为原始曲线是因为一些设备驱动程序中已经包含了用于圆锥的简单增量绘图算法,并且有用于圆锥局部近似的简单算法。根据原函数的一阶导数对原函数进行形状分析,提出了一种自适应选择结点的分裂合并算法。