Python 如何使用 scipy 执行二维插值?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/37872171/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How can I perform two-dimensional interpolation using scipy?
提问by Andras Deak
This Q&A is intended as a canonical(-ish) concerning two-dimensional (and multi-dimensional) interpolation using scipy. There are often questions concerning the basic syntax of various multidimensional interpolation methods, I hope to set these straight too.
此问答旨在作为关于使用 scipy 的二维(和多维)插值的规范(-ish)。关于各种多维插值方法的基本语法经常有问题,我也希望能把这些说清楚。
I have a set of scattered two-dimensional data points, and I would like to plot them as a nice surface, preferably using something like contourf
or plot_surface
in matplotlib.pyplot
. How can I interpolate my two-dimensional or multidimensional data to a mesh using scipy?
我有一组分散的二维数据点,我想将它们绘制为一个漂亮的表面,最好使用类似contourf
或plot_surface
in 的东西matplotlib.pyplot
。如何使用 scipy 将二维或多维数据插入网格?
I've found the scipy.interpolate
sub-package, but I keep getting errors when using interp2d
or bisplrep
or griddata
or rbf
. What is the proper syntax of these methods?
我找到了scipy.interpolate
子包,但是在使用interp2d
orbisplrep
或griddata
or时我不断收到错误消息rbf
。这些方法的正确语法是什么?
回答by Andras Deak
Disclaimer: I'm mostly writing this post with syntactical considerations and general behaviour in mind. I'm not familiar with the memory and CPU aspect of the methods described, and I aim this answer at those who have reasonably small sets of data, such that the quality of the interpolation can be the main aspect to consider. I am aware that when working with very large data sets, the better-performing methods (namely griddata
and Rbf
) might not be feasible.
免责声明:我写这篇文章主要是考虑到语法和一般行为。我不熟悉所描述方法的内存和 CPU 方面,我将这个答案针对那些拥有相当小的数据集的人,因此插值的质量可以成为要考虑的主要方面。我知道在处理非常大的数据集时,性能更好的方法(即griddata
和Rbf
)可能不可行。
I'm going to compare three kinds of multi-dimensional interpolation methods (interp2d
/splines, griddata
and Rbf
). I will subject them to two kinds of interpolation tasks and two kinds of underlying functions (points from which are to be interpolated). The specific examples will demonstrate two-dimensional interpolation, but the viable methods are applicable in arbitrary dimensions. Each method provides various kinds of interpolation; in all cases I will use cubic interpolation (or something close1). It's important to note that whenever you use interpolation you introduce bias compared to your raw data, and the specific methods used affect the artifacts that you will end up with. Always be aware of this, and interpolate responsibly.
我将比较三种多维插值方法(interp2d
/splinesgriddata
和Rbf
)。我将让它们接受两种插值任务和两种底层函数(要插值的点)。具体例子将演示二维插值,但可行的方法适用于任意维度。每种方法都提供了各种插值;在所有情况下,我都会使用三次插值(或接近1 的东西)。重要的是要注意,无论何时使用插值,与原始数据相比都会引入偏差,并且使用的特定方法会影响最终得到的工件。始终注意这一点,并负责任地进行插值。
The two interpolation tasks will be
这两个插值任务将是
- upsampling (input data is on a rectangular grid, output data is on a denser grid)
- interpolation of scattered data onto a regular grid
- 上采样(输入数据在矩形网格上,输出数据在更密集的网格上)
- 将分散数据插值到规则网格上
The two functions (over the domain [x,y] in [-1,1]x[-1,1]
) will be
这两个函数(在域上[x,y] in [-1,1]x[-1,1]
)将是
- a smooth and friendly function:
cos(pi*x)*sin(pi*y)
; range in[-1, 1]
- an evil (and in particular, non-continuous) function:
x*y/(x^2+y^2)
with a value of 0.5 near the origin; range in[-0.5, 0.5]
- 平稳,友好的功能:
cos(pi*x)*sin(pi*y)
; 范围在[-1, 1]
- 邪恶(尤其是非连续)函数:
x*y/(x^2+y^2)
在原点附近的值为 0.5;范围在[-0.5, 0.5]
Here's how they look:
以下是它们的外观:
I will first demonstrate how the three methods behave under these four tests, then I'll detail the syntax of all three. If you know what you should expect from a method, you might not want to waste your time learning its syntax (looking at you, interp2d
).
我将首先演示这三种方法在这四种测试下的表现,然后我将详细介绍所有三种方法的语法。如果您知道应该从方法中得到什么,您可能不想浪费时间学习它的语法(看着您,interp2d
)。
Test data
测试数据
For the sake of explicitness, here is the code with which I generated the input data. While in this specific case I'm obviously aware of the function underlying the data, I will only use this to generate input for the interpolation methods. I use numpy for convenience (and mostly for generating the data), but scipy alone would suffice too.
为了明确起见,这里是我生成输入数据的代码。虽然在这种特定情况下,我显然知道数据背后的函数,但我只会使用它来为插值方法生成输入。我使用 numpy 是为了方便(主要是为了生成数据),但单独使用 scipy 也足够了。
import numpy as np
import scipy.interpolate as interp
# auxiliary function for mesh generation
def gimme_mesh(n):
minval = -1
maxval = 1
# produce an asymmetric shape in order to catch issues with transpositions
return np.meshgrid(np.linspace(minval,maxval,n), np.linspace(minval,maxval,n+1))
# set up underlying test functions, vectorized
def fun_smooth(x, y):
return np.cos(np.pi*x)*np.sin(np.pi*y)
def fun_evil(x, y):
# watch out for singular origin; function has no unique limit there
return np.where(x**2+y**2>1e-10, x*y/(x**2+y**2), 0.5)
# sparse input mesh, 6x7 in shape
N_sparse = 6
x_sparse,y_sparse = gimme_mesh(N_sparse)
z_sparse_smooth = fun_smooth(x_sparse, y_sparse)
z_sparse_evil = fun_evil(x_sparse, y_sparse)
# scattered input points, 10^2 altogether (shape (100,))
N_scattered = 10
x_scattered,y_scattered = np.random.rand(2,N_scattered**2)*2 - 1
z_scattered_smooth = fun_smooth(x_scattered, y_scattered)
z_scattered_evil = fun_evil(x_scattered, y_scattered)
# dense output mesh, 20x21 in shape
N_dense = 20
x_dense,y_dense = gimme_mesh(N_dense)
Smooth function and upsampling
平滑函数和上采样
Let's start with the easiest task. Here's how an upsampling from a mesh of shape [6,7]
to one of [20,21]
works out for the smooth test function:
让我们从最简单的任务开始。以下是从形状网格向上采样[6,7]
到[20,21]
平滑测试函数之一的方法:
Even though this is a simple task, there are already subtle differences between the outputs. At a first glance all three outputs are reasonable. There are two features to note, based on our prior knowledge of the underlying function: the middle case of griddata
distorts the data most. Note the y==-1
boundary of the plot (nearest the x
label): the function should be strictly zero (since y==-1
is a nodal line for the smooth function), yet this is not the case for griddata
. Also note the x==-1
boundary of the plots (behind, to the left): the underlying function has a local maximum (implying zero gradient near the boundary) at [-1, -0.5]
, yet the griddata
output shows clearly non-zero gradient in this region. The effect is subtle, but it's a bias none the less. (The fidelity of Rbf
is even better with the default choice of radial functions, dubbed multiquadratic
.)
尽管这是一项简单的任务,但输出之间已经存在细微差别。乍一看,所有三个输出都是合理的。根据我们对底层函数的先验知识,有两个特征需要注意:中间情况griddata
最扭曲数据。注意y==-1
绘图的边界(最靠近x
标签):函数应该严格为零(因为y==-1
是平滑函数的节点线),但对于 来说情况并非如此griddata
。还要注意图的x==-1
边界(后面,左侧):底层函数在 处有一个局部最大值(意味着边界附近的梯度为零)[-1, -0.5]
,但griddata
输出清楚地显示该区域中的非零梯度。效果是微妙的,但它仍然是一种偏见。(保真度Rbf
使用径向函数的默认选择甚至更好,称为multiquadratic
。)
Evil function and upsampling
邪恶函数和上采样
A bit harder task is to perform upsampling on our evil function:
更难的任务是对我们的邪恶函数执行上采样:
Clear differences are starting to show among the three methods. Looking at the surface plots, there are clear spurious extrema appearing in the output from interp2d
(note the two humps on the right side of the plotted surface). While griddata
and Rbf
seem to produce similar results at first glance, the latter seems to produce a deeper minimum near [0.4, -0.4]
that is absent from the underlying function.
三种方法之间开始出现明显的差异。查看曲面图,从interp2d
(注意绘制曲面右侧的两个驼峰)的输出中出现了明显的虚假极值。虽然griddata
和Rbf
乍一看似乎产生了类似的结果,但后者似乎产生了一个更深的最小值,接近[0.4, -0.4]
底层函数所不存在的最小值。
However, there is one crucial aspect in which Rbf
is far superior: it respects the symmetry of the underlying function (which is of course also made possible by the symmetry of the sample mesh). The output from griddata
breaks the symmetry of the sample points, which is already weakly visible in the smooth case.
但是,有一个关键方面Rbf
要优越得多:它尊重基础函数的对称性(当然,样本网格的对称性也使之成为可能)。的输出griddata
破坏了采样点的对称性,这在平滑情况下已经很弱可见。
Smooth function and scattered data
平滑函数和分散的数据
Most often one wants to perform interpolation on scattered data. For this reason I expect these tests to be more important. As shown above, the sample points were chosen pseudo-uniformly in the domain of interest. In realistic scenarios you might have additional noise with each measurement, and you should consider whether it makes sense to interpolate your raw data to begin with.
大多数情况下,人们希望对分散的数据进行插值。出于这个原因,我希望这些测试更加重要。如上所示,样本点是在感兴趣的域中伪均匀地选择的。在现实场景中,每次测量可能会产生额外的噪音,您应该考虑从插入原始数据开始是否有意义。
Output for the smooth function:
平滑函数的输出:
Now there's already a bit of a horror show going on. I clipped the output from interp2d
to between [-1, 1]
exclusively for plotting, in order to preserve at least a minimal amount of information. It's clear that while some of the underlying shape is present, there are huge noisy regions where the method completely breaks down. The second case of griddata
reproduces the shape fairly nicely, but note the white regions at the border of the contour plot. This is due to the fact that griddata
only works inside the convex hull of the input data points (in other words, it doesn't perform any extrapolation). I kept the default NaN value for output points lying outside the convex hull.2Considering these features, Rbf
seems to perform best.
现在已经有一些恐怖节目正在上演。我将输出从interp2d
到 之间[-1, 1]
专门用于绘图,以便至少保留最少量的信息。很明显,虽然存在一些基本形状,但存在巨大的噪声区域,该方法完全失效。第二种情况griddata
很好地再现了形状,但请注意等高线图边界处的白色区域。这是因为griddata
它只在输入数据点的凸包内起作用(换句话说,它不执行任何外推)。我保留了位于凸包外的输出点的默认 NaN 值。2考虑到这些功能,Rbf
似乎表现最好。
Evil function and scattered data
邪恶的功能和分散的数据
And the moment we've all been waiting for:
而我们一直在等待的那一刻:
It's no huge surprise that interp2d
gives up. In fact, during the call to interp2d
you should expect some friendly RuntimeWarning
s complaining about the impossibility of the spline to be constructed. As for the other two methods, Rbf
seems to produce the best output, even near the borders of the domain where the result is extrapolated.
interp2d
放弃并不奇怪。事实上,在给interp2d
您打电话的过程中,应该会遇到一些友好的RuntimeWarning
s 抱怨无法构造样条曲线。至于其他两种方法,Rbf
似乎产生了最好的输出,即使在结果外推的域的边界附近也是如此。
So let me say a few words about the three methods, in decreasing order of preference (so that the worst is the least likely to be read by anybody).
因此,让我按偏好的降序对这三种方法说几句(这样最坏的方法最不可能被任何人阅读)。
scipy.interpolate.Rbf
scipy.interpolate.Rbf
The Rbf
class stands for "radial basis functions". To be honest I've never considered this approach until I started researching for this post, but I'm pretty sure I'll be using these in the future.
该Rbf
级代表“径向基函数”。老实说,在我开始研究这篇文章之前,我从未考虑过这种方法,但我很确定我将来会使用这些方法。
Just like the spline-based methods (see later), usage comes in two steps: first one creates a callable Rbf
class instance based on the input data, and then calls this object for a given output mesh to obtain the interpolated result. Example from the smooth upsampling test:
就像基于样条的方法(见后),使用分为两个步骤:首先Rbf
根据输入数据创建一个可调用的类实例,然后为给定的输出网格调用这个对象以获得插值结果。平滑上采样测试的示例:
import scipy.interpolate as interp
zfun_smooth_rbf = interp.Rbf(x_sparse, y_sparse, z_sparse_smooth, function='cubic', smooth=0) # default smooth=0 for interpolation
z_dense_smooth_rbf = zfun_smooth_rbf(x_dense, y_dense) # not really a function, but a callable class instance
Note that both input and output points were 2d arrays in this case, and the output z_dense_smooth_rbf
has the same shape as x_dense
and y_dense
without any effort. Also note that Rbf
supports arbitrary dimensions for interpolation.
请注意,在这种情况下,输入和输出点都是二维数组,输出z_dense_smooth_rbf
具有相同的形状x_dense
,y_dense
无需任何努力。另请注意,Rbf
支持任意尺寸的插值。
So, scipy.interpolate.Rbf
所以, scipy.interpolate.Rbf
- produces well-behaved output even for crazy input data
- supports interpolation in higher dimensions
- extrapolates outside the convex hull of the input points (of course extrapolation is always a gamble, and you should generally not rely on it at all)
- creates an interpolator as a first step, so evaluating it in various output points is less additional effort
- can have output points of arbitrary shape (as opposed to being constrained to rectangular meshes, see later)
- prone to preserving the symmetry of the input data
- supports multiple kinds of radial functions for keyword
function
:multiquadric
,inverse
,gaussian
,linear
,cubic
,quintic
,thin_plate
and user-defined arbitrary
- 即使对于疯狂的输入数据也能产生良好的输出
- 支持更高维度的插值
- 在输入点的凸包外外推(当然外推总是一场赌博,您通常根本不应该依赖它)
- 创建一个插值器作为第一步,因此在不同的输出点对其进行评估会减少额外的工作量
- 可以有任意形状的输出点(而不是被限制为矩形网格,见下文)
- 倾向于保持输入数据的对称性
- 支持关键字的多种径向函数
function
:multiquadric
、inverse
、gaussian
、linear
、cubic
、quintic
、thin_plate
和用户定义的任意
scipy.interpolate.griddata
scipy.interpolate.griddata
My former favourite, griddata
, is a general workhorse for interpolation in arbitrary dimensions. It doesn't perform extrapolation beyond setting a single preset value for points outside the convex hull of the nodal points, but since extrapolation is a very fickle and dangerous thing, this is not necessarily a con. Usage example:
我以前最喜欢的,griddata
是用于任意维度插值的通用主力。除了为节点凸包外的点设置单个预设值外,它不会执行外推,但由于外推是一件非常善变和危险的事情,因此这不一定是骗局。用法示例:
z_dense_smooth_griddata = interp.griddata(np.array([x_sparse.ravel(),y_sparse.ravel()]).T,
z_sparse_smooth.ravel(),
(x_dense,y_dense), method='cubic') # default method is linear
Note the slightly kludgy syntax. The input points have to be specified in an array of shape [N, D]
in D
dimensions. For this we first have to flatten our 2d coordinate arrays (using ravel
), then concatenate the arrays and transpose the result. There are multiple ways to do this, but all of them seem to be bulky. The input z
data also have to be flattened. We have a bit more freedom when it comes to the output points: for some reason these can also be specified as a tuple of multidimensional arrays. Note that the help
of griddata
is misleading, as it suggests that the same is true for the inputpoints (at least for version 0.17.0):
请注意略显笨拙的语法。输入点必须[N, D]
在D
尺寸形状的数组中指定。为此,我们首先必须展平我们的 2d 坐标数组(使用ravel
),然后连接数组并转置结果。有多种方法可以做到这一点,但它们似乎都很笨重。输入z
数据也必须扁平化。当涉及到输出点时,我们有更多的自由:出于某种原因,这些也可以指定为多维数组的元组。请注意,help
ofgriddata
具有误导性,因为它表明输入点也是如此(至少对于版本 0.17.0):
griddata(points, values, xi, method='linear', fill_value=nan, rescale=False)
Interpolate unstructured D-dimensional data.
Parameters
----------
points : ndarray of floats, shape (n, D)
Data point coordinates. Can either be an array of
shape (n, D), or a tuple of `ndim` arrays.
values : ndarray of float or complex, shape (n,)
Data values.
xi : ndarray of float, shape (M, D)
Points at which to interpolate data.
In a nutshell, scipy.interpolate.griddata
简而言之, scipy.interpolate.griddata
- produces well-behaved output even for crazy input data
- supports interpolation in higher dimensions
- does not perform extrapolation, a single value can be set for the output outside the convex hull of the input points (see
fill_value
) - computes the interpolated values in a single call, so probing multiple sets of output points starts from scratch
- can have output points of arbitrary shape
- supports nearest-neighbour and linear interpolation in arbitrary dimensions, cubic in 1d and 2d. Nearest-neighbour and linear interpolation use
NearestNDInterpolator
andLinearNDInterpolator
under the hood, respectively. 1d cubic interpolation uses a spline, 2d cubic interpolation usesCloughTocher2DInterpolator
to construct a continuously differentiable piecewise-cubic interpolator. - might violate the symmetry of the input data
- 即使对于疯狂的输入数据也能产生良好的输出
- 支持更高维度的插值
- 不执行外推,可以为输入点凸包外的输出设置单个值(请参阅
fill_value
) - 在单个调用中计算内插值,因此从头开始探测多组输出点
- 可以有任意形状的输出点
- 支持任意维度的最近邻和线性插值,1d 和 2d 的三次。最近邻和线性插值分别使用
NearestNDInterpolator
和LinearNDInterpolator
在引擎盖下。1d 三次插值使用样条,2d 三次插值用于CloughTocher2DInterpolator
构造连续可微的分段三次插值器。 - 可能违反输入数据的对称性
scipy.interpolate.interp2d
/scipy.interpolate.bisplrep
scipy.interpolate.interp2d
/scipy.interpolate.bisplrep
The only reason I'm discussing interp2d
and its relatives is that it has a deceptive name, and people are likely to try using it. Spoiler alert: don't use it (as of scipy version 0.17.0). It's already more special than the previous subjects in that it's specifically used for two-dimensional interpolation, but I suspect this is by far the most common case for multivariate interpolation.
我interp2d
和它的亲戚讨论的唯一原因是它有一个具有欺骗性的名字,人们很可能会尝试使用它。剧透警告:不要使用它(从 scipy 版本 0.17.0 开始)。它已经比之前的主题更加特殊,因为它专门用于二维插值,但我怀疑这是迄今为止多元插值最常见的情况。
As far as syntax goes, interp2d
is similar to Rbf
in that it first needs constructing an interpolation instance, which can be called to provide the actual interpolated values. There's a catch, however: the output points have to be located on a rectangular mesh, so inputs going into the call to the interpolator have to be 1d vectors which span the output grid, as if from numpy.meshgrid
:
就语法而言,interp2d
类似于Rbf
首先需要构造一个插值实例,该实例可以被调用以提供实际的插值值。然而,有一个问题:输出点必须位于矩形网格上,因此进入内插器调用的输入必须是跨越输出网格的一维向量,就像来自numpy.meshgrid
:
# reminder: x_sparse and y_sparse are of shape [6, 7] from numpy.meshgrid
zfun_smooth_interp2d = interp.interp2d(x_sparse, y_sparse, z_sparse_smooth, kind='cubic') # default kind is 'linear'
# reminder: x_dense and y_dense are of shape [20, 21] from numpy.meshgrid
xvec = x_dense[0,:] # 1d array of unique x values, 20 elements
yvec = y_dense[:,0] # 1d array of unique y values, 21 elements
z_dense_smooth_interp2d = zfun_smooth_interp2d(xvec,yvec) # output is [20, 21]-shaped array
One of the most common mistakes when using interp2d
is putting your full 2d meshes into the interpolation call, which leads to explosive memory consumption, and hopefully to a hasty MemoryError
.
使用时最常见的错误之一interp2d
是将完整的 2d 网格放入插值调用中,这会导致爆炸性的内存消耗,并且可能会导致MemoryError
.
Now, the greatest problem with interp2d
is that it often doesn't work. In order to understand this, we have to look under the hood. It turns out that interp2d
is a wrapper for the lower-level functions bisplrep
+bisplev
, which are in turn wrappers for FITPACK routines (written in Fortran). The equivalent call to the previous example would be
现在,最大的问题interp2d
是它通常不起作用。为了理解这一点,我们必须深入了解。事实证明,它interp2d
是低级函数bisplrep
+ 的bisplev
包装器,而后者又是 FITPACK 例程(用 Fortran 编写)的包装器。对上一个示例的等效调用是
kind = 'cubic'
if kind=='linear':
kx=ky=1
elif kind=='cubic':
kx=ky=3
elif kind=='quintic':
kx=ky=5
# bisplrep constructs a spline representation, bisplev evaluates the spline at given points
bisp_smooth = interp.bisplrep(x_sparse.ravel(),y_sparse.ravel(),z_sparse_smooth.ravel(),kx=kx,ky=ky,s=0)
z_dense_smooth_bisplrep = interp.bisplev(xvec,yvec,bisp_smooth).T # note the transpose
Now, here's the thing about interp2d
: (in scipy version 0.17.0) there is a nice comment in interpolate/interpolate.py
for interp2d
:
现在,这里的事儿interp2d
:(在SciPy的版本0.17.0或更新版本)有一个很好的注释interpolate/interpolate.py
为interp2d
:
if not rectangular_grid:
# TODO: surfit is really not meant for interpolation!
self.tck = fitpack.bisplrep(x, y, z, kx=kx, ky=ky, s=0.0)
and indeed in interpolate/fitpack.py
, in bisplrep
there's some setup and ultimately
事实上interpolate/fitpack.py
,在bisplrep
那里有一些设置,最终
tx, ty, c, o = _fitpack._surfit(x, y, z, w, xb, xe, yb, ye, kx, ky,
task, s, eps, tx, ty, nxest, nyest,
wrk, lwrk1, lwrk2)
And that's it. The routines underlying interp2d
are not really meant to perform interpolation. They might suffice for sufficiently well-behaved data, but under realistic circumstances you will probably want to use something else.
就是这样。底层的例程interp2d
并不是真的要执行插值。它们可能足以满足行为良好的数据,但在现实情况下,您可能想要使用其他东西。
Just to conclude, interpolate.interp2d
总结一下, interpolate.interp2d
- can lead to artifacts even with well-tempered data
- is specifically for bivariate problems (although there's the limited
interpn
for input points defined on a grid) - performs extrapolation
- creates an interpolator as a first step, so evaluating it in various output points is less additional effort
- can only produce output over a rectangular grid, for scattered output you would have to call the interpolator in a loop
- supports linear, cubic and quintic interpolation
- might violate the symmetry of the input data
- 即使使用经过良好处理的数据也可能导致伪影
- 专门用于二元问题(尽管
interpn
网格上定义的输入点有限制) - 执行外推
- 创建一个插值器作为第一步,因此在不同的输出点对其进行评估会减少额外的工作量
- 只能在矩形网格上产生输出,对于分散的输出,您必须在循环中调用插值器
- 支持线性、三次和五次插值
- 可能违反输入数据的对称性
1I'm fairly certain that the cubic
and linear
kind of basis functions of Rbf
do not exactly correspond to the other interpolators of the same name.
2These NaNs are also the reason for why the surface plot seems so odd: matplotlib historically has difficulties with plotting complex 3d objects with proper depth information. The NaN values in the data confuse the renderer, so parts of the surface that should be in the back are plotted to be in the front. This is an issue with visualization, and not interpolation.
1我相当肯定的基函数cubic
和linear
基函数的种类Rbf
并不完全对应于同名的其他插值器。
2这些 NaN 也是表面图看起来如此奇怪的原因:matplotlib 历来难以用适当的深度信息绘制复杂的 3d 对象。数据中的 NaN 值会混淆渲染器,因此应该在后面的表面部分被绘制在前面。这是可视化的问题,而不是插值。