Python 为熊猫添加趋势线

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/36627442/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 18:08:30  来源:igfitidea点击:

Add trend line to pandas

pythonpandasmatplotlibmachine-learningstatsmodels

提问by FooBar

I have time-series data, as followed:

我有时间序列数据,如下:

                  emplvl
date                    
2003-01-01  10955.000000
2003-04-01  11090.333333
2003-07-01  11157.000000
2003-10-01  11335.666667
2004-01-01  11045.000000
2004-04-01  11175.666667
2004-07-01  11135.666667
2004-10-01  11480.333333
2005-01-01  11441.000000
2005-04-01  11531.000000
2005-07-01  11320.000000
2005-10-01  11516.666667
2006-01-01  11291.000000
2006-04-01  11223.000000
2006-07-01  11230.000000
2006-10-01  11293.000000
2007-01-01  11126.666667
2007-04-01  11383.666667
2007-07-01  11535.666667
2007-10-01  11567.333333
2008-01-01  11226.666667
2008-04-01  11342.000000
2008-07-01  11201.666667
2008-10-01  11321.000000
2009-01-01  11082.333333
2009-04-01  11099.000000
2009-07-01  10905.666667

time series graph

时间序列图

I would like to add, in the most simple way, a linear trend (with intercept) onto this graph. Also, I would like to compute this trend only conditional on data before, say, 2006.

我想以最简单的方式在此图上添加一个线性趋势(带截距)。另外,我想仅以 2006 年之前的数据为条件来计算这种趋势。

I've found some answers here, but they all include statsmodels. First of all, these answers might be not up to date: pandasimproved, and now itself includes an OLS component. Second, statsmodelsappears to estimate an individual fixed-effect for each time period, instead of a linear trend. I suppose I could recalculate a running-quarter variable, but there most be a more comfortable way of doing this?

我在这里找到了一些答案,但它们都包括statsmodels. 首先,这些答案可能不是最新的:经过pandas改进,现在本身包含一个 OLS 组件。其次,statsmodels似乎估计每个时间段的单个固定效应,而不是线性趋势。我想我可以重新计算一个运行季度变量,但大多数情况下有更舒适的方法吗?

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                 emplvl   R-squared:                       1.000
Model:                            OLS   Adj. R-squared:                    nan
Method:                 Least Squares   F-statistic:                     0.000
Date:                tor, 14 apr 2016   Prob (F-statistic):                nan
Time:                        17:17:43   Log-Likelihood:                 929.85
No. Observations:                  40   AIC:                            -1780.
Df Residuals:                       0   BIC:                            -1712.
Df Model:                          39                                         
Covariance Type:            nonrobust                                         
============================================================================================================
                                               coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------------------------------------
Intercept                                 1.095e+04        inf          0        nan           nan       nan
date[T.Timestamp('2003-04-01 00:00:00')]   135.3333        inf          0        nan           nan       nan
date[T.Timestamp('2003-07-01 00:00:00')]   202.0000        inf          0        nan           nan       nan
date[T.Timestamp('2003-10-01 00:00:00')]   380.6667        inf          0        nan           nan       nan
date[T.Timestamp('2004-01-01 00:00:00')]    90.0000        inf          0        nan           nan       nan
date[T.Timestamp('2004-04-01 00:00:00')]   220.6667        inf          0        nan           nan       nan

How do I, in the simplest way possible, estimate this trend and add the predicted values as a column to my data frame?

我如何以最简单的方式估计这种趋势并将预测值作为一列添加到我的数据框中?

回答by Stefan

Here's a quick example on how to do this using pandas.ols:

这是有关如何使用pandas.ols以下方法执行此操作的快速示例:

import matplotlib.pyplot as plt
import pandas as pd

x = pd.Series(np.arange(50))
y = pd.Series(10 + (2 * x + np.random.randint(-5, + 5, 50)))
regression = pd.ols(y=y, x=x)
regression.summary

-------------------------Summary of Regression Analysis-------------------------

Formula: Y ~ <x> + <intercept>

Number of Observations:         50
Number of Degrees of Freedom:   2

R-squared:         0.9913
Adj R-squared:     0.9911

Rmse:              2.7625

F-stat (1, 48):  5465.1446, p-value:     0.0000

Degrees of Freedom: model 1, resid 48

-----------------------Summary of Estimated Coefficients------------------------
      Variable       Coef    Std Err     t-stat    p-value    CI 2.5%   CI 97.5%
--------------------------------------------------------------------------------
             x     2.0013     0.0271      73.93     0.0000     1.9483     2.0544
     intercept     9.5271     0.7698      12.38     0.0000     8.0183    11.0358
---------------------------------End of Summary---------------------------------

trend = regression.predict(beta=regression.beta, x=x[20:]) # slicing to only use last 30 points
data = pd.DataFrame(index=x, data={'y': y, 'trend': trend})
data.plot() # add kwargs for title and other layout/design aspects
plt.show() # or plt.gcf().savefig(path)

enter image description here

在此处输入图片说明

回答by Paul H

In general you should create your matplotlib figure and axes object ahead of time, and explicitly plot the dataframe on that:

通常,您应该提前创建 matplotlib 图形和轴对象,并在其上显式绘制数据框:

from matplotlib import pyplot
import pandas
import statsmodels.api as sm

df = pandas.read_csv(...)

fig, ax = pyplot.subplots()
df.plot(x='xcol', y='ycol', ax=ax)

Then you still have that axes object around to use directly to plot your line:

然后您仍然可以使用该轴对象直接用于绘制您的线条:

model = sm.formula.ols(formula='ycol ~ xcol', data=df)
res = model.fit()
df.assign(fit=res.fittedvalues).plot(x='xcol', y='fit', ax=ax)