外推 Pandas DataFrame

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/34159342/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:21:07  来源:igfitidea点击:

Extrapolate Pandas DataFrame

pythonpython-2.7pandasextrapolation

提问by Nyxynyx

It is easy to interpolate values in a Pandas.DataFrameusing Series.interpolate, how can extrapolation be done?

Pandas.DataFrameusing 中插入值很容易Series.interpolate,如何进行外插?

For example, given a DataFrame as shown, how can we extrapolate it 14 more months to 2014-12-31? Linear extrapolation is fine.

例如,给定一个如图所示的 DataFrame,我们如何将它外推 14 个月到 2014-12-31?线性外推很好。

X1 = range(10)
X2 = map(lambda x: x**2, X1)
df = pd.DataFrame({'x1': X1, 'x2': X2},  index=pd.date_range('20130101',periods=10,freq='M'))

I am thinking that a new DataFrame must first be created, with the DateTimeIndex starting from 2013-11-31 and extending for 14 more Mperiods. Beyond that I'm stuck.

我认为必须首先创建一个新的 DataFrame,DateTimeIndex 从 2013-11-31 开始并延长 14 个M周期。除此之外,我被困住了。

enter image description here

在此处输入图片说明

回答by tmthydvnprt

Extrapolating a DataFramewith a DatetimeIndexindex

DataFrameDatetimeIndex索引外推 a

This can be done with two steps:

这可以通过两个步骤来完成:

  1. Extend the DatetimeIndex
  2. Extrapolate the data
  1. 延长 DatetimeIndex
  2. 推断数据

Extend the Index

扩展索引

Overwrite dfwith a new DataFramewhere the data is resampledonto a new extendedindex based on original index's start, period and frequency. This allows the original dfto come from anywhere, as in the csvexample case. With this the columns get conveniently filled with NaNs!

df使用新的覆盖,DataFrame其中数据根据原始索引的开始、周期和频率重新采样到新的扩展索引上。这允许原始文件来自任何地方,如示例中所示。有了这个,列可以方便地填充 NaNdfcsv

# Fake DataFrame for example (could come from anywhere)
X1 = range(10)
X2 = map(lambda x: x**2, X1)
df = pd.DataFrame({'x1': X1, 'x2': X2},  index=pd.date_range('20130101',periods=10,freq='M'))

# Number of months to extend
extend = 5

# Extrapolate the index first based on original index
df = pd.DataFrame(
    data=df,
    index=pd.date_range(
        start=df.index[0],
        periods=len(df.index) + extend,
        freq=df.index.freq
    )
)

# Display
print df


    x1  x2
2013-01-31   0   0
2013-02-28   1   1
2013-03-31   2   4
2013-04-30   3   9
2013-05-31   4  16
2013-06-30   5  25
2013-07-31   6  36
2013-08-31   7  49
2013-09-30   8  64
2013-10-31   9  81
2013-11-30 NaN NaN
2013-12-31 NaN NaN
2014-01-31 NaN NaN
2014-02-28 NaN NaN
2014-03-31 NaN NaN

Extrapolate the data

推断数据

Most extrapolators will require the inputs to be numeric instead of dates. This can be done with

大多数外推器将要求输入为数字而不是日期。这可以用

# Temporarily remove dates and make index numeric
di = df.index
df = df.reset_index().drop('index', 1)

See this answerfor how to extrapolate the values of each column of a DataFramewith a 3rdorder polynomial.

看到这个答案如何推断的每一列的值DataFrame3多项式

Snippet from answer

# Curve fit each column
for col in fit_df.columns:
    # Get x & y
    x = fit_df.index.astype(float).values
    y = fit_df[col].values
    # Curve fit column and get curve parameters
    params = curve_fit(func, x, y, guess)
    # Store optimized parameters
    col_params[col] = params[0]

# Extrapolate each column
for col in df.columns:
    # Get the index values for NaNs in the column
    x = df[pd.isnull(df[col])].index.astype(float).values
    # Extrapolate those points with the fitted function
    df[col][x] = func(x, *col_params[col])

来自答案的片段

# Curve fit each column
for col in fit_df.columns:
    # Get x & y
    x = fit_df.index.astype(float).values
    y = fit_df[col].values
    # Curve fit column and get curve parameters
    params = curve_fit(func, x, y, guess)
    # Store optimized parameters
    col_params[col] = params[0]

# Extrapolate each column
for col in df.columns:
    # Get the index values for NaNs in the column
    x = df[pd.isnull(df[col])].index.astype(float).values
    # Extrapolate those points with the fitted function
    df[col][x] = func(x, *col_params[col])

Once the columns are extrapolated, put the dates back

外推列后,将日期放回原处

# Put date index back
df.index = di

# Display
print df


x1   x2
2013-01-31   0    0
2013-02-28   1    1
2013-03-31   2    4
2013-04-30   3    9
2013-05-31   4   16
2013-06-30   5   25
2013-07-31   6   36
2013-08-31   7   49
2013-09-30   8   64
2013-10-31   9   81
2013-11-30  10  100
2013-12-31  11  121
2014-01-31  12  144
2014-02-28  13  169
2014-03-31  14  196