pandas 使用熊猫时间序列进行线性回归

Question

提问by vandelay

I have a dataframe object which contains 1 seconds intervals of the EUR_USD currency pair. But in theory it could be any interval and in this case it could look like this:

我有一个数据框对象，其中包含 EUR_USD 货币对的 1 秒间隔。但理论上它可以是任何间隔，在这种情况下它可能看起来像这样：

2015-11-10 01:00:00+01:00    1.07616
2015-11-10 01:01:00+01:00    1.07605
2015-11-10 01:02:00+01:00    1.07590
2015-11-10 01:03:00+01:00    1.07592
2015-11-10 01:04:00+01:00    1.07583

I'd like to use linear regression to draw a trend line from the data in dataframe, but I'm not sure what the best way are to do that with time series, and even such a small interval of time series.

我想使用线性回归从数据框中的数据中绘制趋势线，但我不确定用时间序列甚至这么小的时间序列间隔最好的方法是什么。

So far I've messed around by replacing the time by (and this is just to show where I'd like to go with it) a list ranging from 0 to the time series list length.

到目前为止，我已经通过将时间替换为（这只是为了显示我想用它去哪里）一个从 0 到时间序列列表长度的列表来解决问题。

x = list(range(0, len(df.index.tolist()), 1))
y = df["closeAsk"].tolist()

Using numpy to do the math magic

使用 numpy 来做数学魔术

fit = np.polyfit(x,y,1)
fit_fn = np.poly1d(fit)

Lastly I draw the function along with the df["closeAsk"] to make sense of the trend.

最后，我将函数与 df["closeAsk"] 一起绘制以了解趋势。

plt.plot(x,df["closeAsk"], '-')
plt.plot(x,y, 'yo', x, fit_fn(x), '--k')
plt.show()

However now the x-axis is just meaningless numbers, instead I'd like for them to show the time series.

但是现在 x 轴只是无意义的数字，相反我希望它们显示时间序列。

Answer 1

回答by lanery

To elaborate on my comment:

详细说明我的评论：

Say you have some evenly spacedtime series data, time, and some correlated data, data, as you've laid out in your question.

假设您有一些均匀分布的时间序列数据time和一些相关数据data，正如您在问题中所阐述的那样。

time = pd.date_range('9:00', '10:00', freq='1s')
data = np.cumsum(np.random.randn(time.size))

df = pd.DataFrame({'time' : time,
                   'data' : data})

As you've shown, you can do a linear fit of the data with np.polyfitand create the trend line with np.poly1d.

正如您所展示的，您可以使用对数据进行线性拟合并使用来np.polyfit创建趋势线np.poly1d。

x = np.arange(time.size) # = array([0, 1, 2, ..., 3598, 3599, 3600])
fit = np.polyfit(x, df['data'], 1)
fit_fn = np.poly1d(fit)

Then plot the data and the fit with df['time']as the x-axis.

然后绘制数据和拟合df['time']作为 x 轴。

plt.plot(df['time'], fit_fn(x), 'k-')
plt.plot(df['time'], df['data'], 'go', ms=2)

Answer 2

回答by knagaev

May be you wil be happy with seaborn? Please try seaborn.regplot

也许你会对 seaborn 感到满意？请尝试 seaborn.regplot

Answer 3

回答by Bj?rn

you can create a numpy linspace for the x-values in the same length as your datapoint like so:

您可以为与数据点长度相同的 x 值创建一个 numpy linspace，如下所示：

y = df["closeAsk"].dropna() # or.fillna(method='bfill')
x = np.linspace(1, len(y), num=len(y))

import seaborn as sb

sb.regplot(x, y)

pandas 使用熊猫时间序列进行线性回归

提问by vandelay

回答by lanery

回答by knagaev

回答by Bj?rn

相关推荐

最近更新

标签

pandas 使用熊猫时间序列进行线性回归

提问by vandelay

回答by lanery

回答by knagaev

回答by Bj?rn

相关推荐

无法将 Pandas 列从对象转换为 Python 中的浮动

pandas 获取 ValueError：endog 和 exog 的索引未对齐

Pandas 数据透视表百分比计算

使用其他行中的值将函数应用于 Pandas 数据帧行

相关推荐

最近更新

标签