如何在长 Pandas 系列上应用三次样条插值?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/32501347/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:52:13  来源:igfitidea点击:

How to apply cubic spline interpolation over long Pandas Series?

pythonpandasinterpolation

提问by Crolle

I need to replace missing data within pandas Series using cubic spline interpolation. I figured out that I could use the pandas.Series.interpolate(method='cubic')method, which looks like this:

我需要使用三次样条插值替换Pandas系列中的缺失数据。我发现我可以使用这个pandas.Series.interpolate(method='cubic')方法,它看起来像这样:

import numpy as np
import pandas as pd

# create series
size = 50
x = np.linspace(-2, 5, size)
y = pd.Series(np.sin(x))

# deleting data segment
y[10:30] = np.nan

# interpolation
y = y.interpolate(method='cubic')

Although this method works just fine for small series (size = 50), it seems to cause the program to freeze for larger ones (size = 5000). Is there a workaround?

虽然这种方法对小系列(size = 50)效果很好,但对于大系列()似乎会导致程序冻结size = 5000。有解决方法吗?

回答by chrisb

pandascalls out to the scipyinterpolation routines, I'm not sure why 'cubic'is so memory hungry and slow.

pandas调用scipy插值例程,我不知道为什么'cubic'内存如此饥饿和缓慢。

As a workaround, you could use method='spline'(scipy ref here), which with the right parameters, gives essentially (seems to be some floating point differences?) the same results and is dramatically faster.

作为一种解决方法,您可以使用method='spline'(scipy ref here),它具有正确的参数,基本上可以提供(似乎是一些浮点差异?)相同的结果并且速度明显更快。

In [104]: # create series
     ...: size = 2000
     ...: x = np.linspace(-2, 5, size)
     ...: y = pd.Series(np.sin(x))
     ...: 
     ...: # deleting data segment
     ...: y[10:30] = np.nan
     ...: 

In [105]: %time cubic = y.interpolate(method='cubic')
Wall time: 4.94 s

In [106]: %time spline = y.interpolate(method='spline', order=3, s=0.)
Wall time: 1 ms

In [107]: (cubic == spline).all()
Out[107]: False

In [108]: pd.concat([cubic, spline], axis=1).loc[5:35, :]
Out[108]: 
           0         1
5  -0.916444 -0.916444
6  -0.917840 -0.917840
7  -0.919224 -0.919224
8  -0.920597 -0.920597
9  -0.921959 -0.921959
10 -0.923309 -0.923309
11 -0.924649 -0.924649
12 -0.925976 -0.925976
13 -0.927293 -0.927293