pandas 使用熊猫时间序列进行线性回归

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/37337836/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:15:41  来源:igfitidea点击:

Linear regression with pandas time series

pythonpandas

提问by vandelay

I have a dataframe object which contains 1 seconds intervals of the EUR_USD currency pair. But in theory it could be any interval and in this case it could look like this:

我有一个数据框对象,其中包含 EUR_USD 货币对的 1 秒间隔。但理论上它可以是任何间隔,在这种情况下它可能看起来像这样:

2015-11-10 01:00:00+01:00    1.07616
2015-11-10 01:01:00+01:00    1.07605
2015-11-10 01:02:00+01:00    1.07590
2015-11-10 01:03:00+01:00    1.07592
2015-11-10 01:04:00+01:00    1.07583

I'd like to use linear regression to draw a trend line from the data in dataframe, but I'm not sure what the best way are to do that with time series, and even such a small interval of time series.

我想使用线性回归从数据框中的数据中绘制趋势线,但我不确定用时间序列甚至这么小的时间序列间隔最好的方法是什么。

So far I've messed around by replacing the time by (and this is just to show where I'd like to go with it) a list ranging from 0 to the time series list length.

到目前为止,我已经通过将时间替换为(这只是为了显示我想用它去哪里)一个从 0 到时间序列列表长度的列表来解决问题。

x = list(range(0, len(df.index.tolist()), 1))
y = df["closeAsk"].tolist()

Using numpy to do the math magic

使用 numpy 来做数学魔术

fit = np.polyfit(x,y,1)
fit_fn = np.poly1d(fit)

Lastly I draw the function along with the df["closeAsk"] to make sense of the trend.

最后,我将函数与 df["closeAsk"] 一起绘制以了解趋势。

plt.plot(x,df["closeAsk"], '-')
plt.plot(x,y, 'yo', x, fit_fn(x), '--k')
plt.show()

However now the x-axis is just meaningless numbers, instead I'd like for them to show the time series.

但是现在 x 轴只是无意义的数字,相反我希望它们显示时间序列。

回答by lanery

To elaborate on my comment:

详细说明我的评论:

Say you have some evenly spacedtime series data, time, and some correlated data, data, as you've laid out in your question.

假设您有一些均匀分布的时间序列数据time和一些相关数据data,正如您在问题中所阐述的那样。

time = pd.date_range('9:00', '10:00', freq='1s')
data = np.cumsum(np.random.randn(time.size))

df = pd.DataFrame({'time' : time,
                   'data' : data})

As you've shown, you can do a linear fit of the data with np.polyfitand create the trend line with np.poly1d.

正如您所展示的,您可以使用 对数据进行线性拟合并使用 来np.polyfit创建趋势线np.poly1d

x = np.arange(time.size) # = array([0, 1, 2, ..., 3598, 3599, 3600])
fit = np.polyfit(x, df['data'], 1)
fit_fn = np.poly1d(fit)

Then plot the data and the fit with df['time']as the x-axis.

然后绘制数据和拟合df['time']作为 x 轴。

plt.plot(df['time'], fit_fn(x), 'k-')
plt.plot(df['time'], df['data'], 'go', ms=2)

enter image description here

在此处输入图片说明

回答by knagaev

May be you wil be happy with seaborn? Please try seaborn.regplot

也许你会对 seaborn 感到满意?请尝试 seaborn.regplot

Plot the relationship between two variables in a DataFrame

绘制 DataFrame 中两个变量之间的关系

回答by Bj?rn

you can create a numpy linspace for the x-values in the same length as your datapoint like so:

您可以为与数据点长度相同的 x 值创建一个 numpy linspace,如下所示:

y = df["closeAsk"].dropna() # or.fillna(method='bfill')
x = np.linspace(1, len(y), num=len(y))

import seaborn as sb

sb.regplot(x, y)