外推 Pandas DataFrame
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/34159342/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Extrapolate Pandas DataFrame
提问by Nyxynyx
It is easy to interpolate values in a Pandas.DataFrame
using Series.interpolate
, how can extrapolation be done?
在Pandas.DataFrame
using 中插入值很容易Series.interpolate
,如何进行外插?
For example, given a DataFrame as shown, how can we extrapolate it 14 more months to 2014-12-31? Linear extrapolation is fine.
例如,给定一个如图所示的 DataFrame,我们如何将它外推 14 个月到 2014-12-31?线性外推很好。
X1 = range(10)
X2 = map(lambda x: x**2, X1)
df = pd.DataFrame({'x1': X1, 'x2': X2}, index=pd.date_range('20130101',periods=10,freq='M'))
I am thinking that a new DataFrame must first be created, with the DateTimeIndex starting from 2013-11-31 and extending for 14 more M
periods. Beyond that I'm stuck.
我认为必须首先创建一个新的 DataFrame,DateTimeIndex 从 2013-11-31 开始并延长 14 个M
周期。除此之外,我被困住了。
回答by tmthydvnprt
Extrapolating a DataFrame
with a DatetimeIndex
index
DataFrame
用DatetimeIndex
索引外推 a
This can be done with two steps:
这可以通过两个步骤来完成:
- Extend the
DatetimeIndex
- Extrapolate the data
- 延长
DatetimeIndex
- 推断数据
Extend the Index
扩展索引
Overwrite df
with a new DataFrame
where the data is resampledonto a new extendedindex based on original index's start, period and frequency. This allows the original df
to come from anywhere, as in the csv
example case. With this the columns get conveniently filled with NaNs!
df
使用新的覆盖,DataFrame
其中数据根据原始索引的开始、周期和频率重新采样到新的扩展索引上。这允许原始文件来自任何地方,如示例中所示。有了这个,列可以方便地填充 NaN!df
csv
# Fake DataFrame for example (could come from anywhere)
X1 = range(10)
X2 = map(lambda x: x**2, X1)
df = pd.DataFrame({'x1': X1, 'x2': X2}, index=pd.date_range('20130101',periods=10,freq='M'))
# Number of months to extend
extend = 5
# Extrapolate the index first based on original index
df = pd.DataFrame(
data=df,
index=pd.date_range(
start=df.index[0],
periods=len(df.index) + extend,
freq=df.index.freq
)
)
# Display
print df
x1 x2
2013-01-31 0 0
2013-02-28 1 1
2013-03-31 2 4
2013-04-30 3 9
2013-05-31 4 16
2013-06-30 5 25
2013-07-31 6 36
2013-08-31 7 49
2013-09-30 8 64
2013-10-31 9 81
2013-11-30 NaN NaN
2013-12-31 NaN NaN
2014-01-31 NaN NaN
2014-02-28 NaN NaN
2014-03-31 NaN NaN
Extrapolate the data
推断数据
Most extrapolators will require the inputs to be numeric instead of dates. This can be done with
大多数外推器将要求输入为数字而不是日期。这可以用
# Temporarily remove dates and make index numeric
di = df.index
df = df.reset_index().drop('index', 1)
See this answerfor how to extrapolate the values of each column of a DataFrame
with a 3rdorder polynomial.
看到这个答案如何推断的每一列的值DataFrame
用3次多项式。
Snippet from answer
# Curve fit each column for col in fit_df.columns: # Get x & y x = fit_df.index.astype(float).values y = fit_df[col].values # Curve fit column and get curve parameters params = curve_fit(func, x, y, guess) # Store optimized parameters col_params[col] = params[0] # Extrapolate each column for col in df.columns: # Get the index values for NaNs in the column x = df[pd.isnull(df[col])].index.astype(float).values # Extrapolate those points with the fitted function df[col][x] = func(x, *col_params[col])
来自答案的片段
# Curve fit each column for col in fit_df.columns: # Get x & y x = fit_df.index.astype(float).values y = fit_df[col].values # Curve fit column and get curve parameters params = curve_fit(func, x, y, guess) # Store optimized parameters col_params[col] = params[0] # Extrapolate each column for col in df.columns: # Get the index values for NaNs in the column x = df[pd.isnull(df[col])].index.astype(float).values # Extrapolate those points with the fitted function df[col][x] = func(x, *col_params[col])
Once the columns are extrapolated, put the dates back
外推列后,将日期放回原处
# Put date index back
df.index = di
# Display
print df
x1 x2
2013-01-31 0 0
2013-02-28 1 1
2013-03-31 2 4
2013-04-30 3 9
2013-05-31 4 16
2013-06-30 5 25
2013-07-31 6 36
2013-08-31 7 49
2013-09-30 8 64
2013-10-31 9 81
2013-11-30 10 100
2013-12-31 11 121
2014-01-31 12 144
2014-02-28 13 169
2014-03-31 14 196