python线性回归按日期预测

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/40217369/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 23:18:14  来源:igfitidea点击:

python linear regression predict by date

pythondatepandaslinear-regression

提问by jeangelj

I want to predict a value at a date in the future with simple linear regression, but I can't due to the date format.

我想用简单的线性回归预测未来某个日期的值,但由于日期格式的原因我不能。

This is the dataframe I have:

这是我拥有的数据框:

data_df = 
date          value
2016-01-15    1555
2016-01-16    1678
2016-01-17    1789
...  

y = np.asarray(data_df['value'])
X = data_df[['date']]
X_train, X_test, y_train, y_test = train_test_split             
(X,y,train_size=.7,random_state=42)

model = LinearRegression() #create linear regression object
model.fit(X_train, y_train) #train model on train data
model.score(X_train, y_train) #check score

print (‘Coefficient: \n', model.coef_)
print (‘Intercept: \n', model.intercept_) 
coefs = zip(model.coef_, X.columns)
model.__dict__
print "sl = %.1f + " % model.intercept_ + \
     " + ".join("%.1f %s" % coef for coef in coefs) #linear model

I tried to convert the date unsuccessfully

我试图转换日期失败

data_df['conv_date'] = data_df.date.apply(lambda x: x.toordinal())

data_df['conv_date'] = pd.to_datetime(data_df.date, format="%Y-%M-%D")

回答by Chandan

Linear regression doesn't work on date data. Therefore we need to convert it into numerical value.The following code will convert the date into numerical value:

线性回归不适用于日期数据。因此我们需要将其转换为数值。以下代码将日期转换为数值:

import datetime as dt
data_df['Date'] = pd.to_datetime(data_df['Date'])
data_df['Date']=data_df['Date'].map(dt.datetime.toordinal)

回答by Siraj S.

convert:

转变:

1) date to dataframe index

1)日期到数据框索引

df = df.set_index('date', append=False)

2) convert datetime object to float64 object

2) 将日期时间对象转换为 float64 对象

df = df.index.to_julian_date()

run the regression with date being the independent variable.

以日期为自变量运行回归。

回答by qmaruf

Liner regression works on numerical data. Datetime type is not appropriate for this case. You should remove that column after separating it to three separate columns (year, month and day).

线性回归适用于数值数据。日期时间类型不适合这种情况。您应该在将该列分成三个单独的列(年、月和日)后将其删除。

回答by Thomas Vetterli

When using

使用时

dt.datetime.toordinal

be careful that it only converts dates values and does not take into account minutes, seconds etc.. For a complete answer on generating ordinals from full datetime objects you can use something like:

请注意,它只转换日期值,不考虑分钟、秒等。有关从完整日期时间对象生成序数的完整答案,您可以使用以下内容:

df['Datetime column'],apply(lambda x: time.mktime(x.timetuple()))