python线性回归按日期预测
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/40217369/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
python linear regression predict by date
提问by jeangelj
I want to predict a value at a date in the future with simple linear regression, but I can't due to the date format.
我想用简单的线性回归预测未来某个日期的值,但由于日期格式的原因我不能。
This is the dataframe I have:
这是我拥有的数据框:
data_df =
date value
2016-01-15 1555
2016-01-16 1678
2016-01-17 1789
...
y = np.asarray(data_df['value'])
X = data_df[['date']]
X_train, X_test, y_train, y_test = train_test_split
(X,y,train_size=.7,random_state=42)
model = LinearRegression() #create linear regression object
model.fit(X_train, y_train) #train model on train data
model.score(X_train, y_train) #check score
print (‘Coefficient: \n', model.coef_)
print (‘Intercept: \n', model.intercept_)
coefs = zip(model.coef_, X.columns)
model.__dict__
print "sl = %.1f + " % model.intercept_ + \
" + ".join("%.1f %s" % coef for coef in coefs) #linear model
I tried to convert the date unsuccessfully
我试图转换日期失败
data_df['conv_date'] = data_df.date.apply(lambda x: x.toordinal())
data_df['conv_date'] = pd.to_datetime(data_df.date, format="%Y-%M-%D")
回答by Chandan
Linear regression doesn't work on date data. Therefore we need to convert it into numerical value.The following code will convert the date into numerical value:
线性回归不适用于日期数据。因此我们需要将其转换为数值。以下代码将日期转换为数值:
import datetime as dt
data_df['Date'] = pd.to_datetime(data_df['Date'])
data_df['Date']=data_df['Date'].map(dt.datetime.toordinal)
回答by Siraj S.
convert:
转变:
1) date to dataframe index
1)日期到数据框索引
df = df.set_index('date', append=False)
2) convert datetime object to float64 object
2) 将日期时间对象转换为 float64 对象
df = df.index.to_julian_date()
run the regression with date being the independent variable.
以日期为自变量运行回归。
回答by qmaruf
Liner regression works on numerical data. Datetime type is not appropriate for this case. You should remove that column after separating it to three separate columns (year, month and day).
线性回归适用于数值数据。日期时间类型不适合这种情况。您应该在将该列分成三个单独的列(年、月和日)后将其删除。
回答by Thomas Vetterli
When using
使用时
dt.datetime.toordinal
be careful that it only converts dates values and does not take into account minutes, seconds etc.. For a complete answer on generating ordinals from full datetime objects you can use something like:
请注意,它只转换日期值,不考虑分钟、秒等。有关从完整日期时间对象生成序数的完整答案,您可以使用以下内容:
df['Datetime column'],apply(lambda x: time.mktime(x.timetuple()))