使用python进行线性回归的简单预测
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/29623171/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Simple prediction using linear regression with python
提问by Jimmys
data2 = pd.DataFrame(data1['kwh'])
data2
kwh
date
2012-04-12 14:56:50 1.256400
2012-04-12 15:11:55 1.430750
2012-04-12 15:27:01 1.369910
2012-04-12 15:42:06 1.359350
2012-04-12 15:57:10 1.305680
2012-04-12 16:12:10 1.287750
2012-04-12 16:27:14 1.245970
2012-04-12 16:42:19 1.282280
2012-04-12 16:57:24 1.365710
2012-04-12 17:12:28 1.320130
2012-04-12 17:27:33 1.354890
2012-04-12 17:42:37 1.343680
2012-04-12 17:57:41 1.314220
2012-04-12 18:12:44 1.311970
2012-04-12 18:27:46 1.338980
2012-04-12 18:42:51 1.357370
2012-04-12 18:57:54 1.328700
2012-04-12 19:12:58 1.308200
2012-04-12 19:28:01 1.341770
2012-04-12 19:43:04 1.278350
2012-04-12 19:58:07 1.253170
2012-04-12 20:13:10 1.420670
2012-04-12 20:28:15 1.292740
2012-04-12 20:43:15 1.322840
2012-04-12 20:58:18 1.247410
2012-04-12 21:13:20 0.568352
2012-04-12 21:28:22 0.317865
2012-04-12 21:43:24 0.233603
2012-04-12 21:58:27 0.229524
2012-04-12 22:13:29 0.236929
2012-04-12 22:28:34 0.233806
2012-04-12 22:43:38 0.235618
2012-04-12 22:58:43 0.229858
2012-04-12 23:13:43 0.235132
2012-04-12 23:28:46 0.231863
2012-04-12 23:43:55 0.237794
2012-04-12 23:59:00 0.229634
2012-04-13 00:14:02 0.234484
2012-04-13 00:29:05 0.234189
2012-04-13 00:44:09 0.237213
2012-04-13 00:59:09 0.230483
2012-04-13 01:14:10 0.234982
2012-04-13 01:29:11 0.237121
2012-04-13 01:44:16 0.230910
2012-04-13 01:59:22 0.238406
2012-04-13 02:14:21 0.250530
2012-04-13 02:29:24 0.283575
2012-04-13 02:44:24 0.302299
2012-04-13 02:59:25 0.322093
2012-04-13 03:14:30 0.327600
2012-04-13 03:29:31 0.324368
2012-04-13 03:44:31 0.301869
2012-04-13 03:59:42 0.322019
2012-04-13 04:14:43 0.325328
2012-04-13 04:29:43 0.306727
2012-04-13 04:44:46 0.299012
2012-04-13 04:59:47 0.303288
2012-04-13 05:14:48 0.326205
2012-04-13 05:29:49 0.344230
2012-04-13 05:44:50 0.353484
...
65701 rows × 1 columns
I have this dataframe with this index and 1 column.I want to do simple prediction using linear regression with sklearn.I'm very confused and I don't know how to set X and y(I want the x values to be the time and y values kwh...).I'm new to Python so every help is valuable.Thank you.
我有这个带有索引和 1 列的数据框。我想使用线性回归和 sklearn 进行简单的预测。我很困惑,我不知道如何设置 X 和 y(我希望 x 值是时间和 y 值 kwh...)。我是 Python 新手,所以每一个帮助都很有价值。谢谢。
采纳答案by TheWalkingCube
The first thing you have to do is split your data into two arrays, X and y. Each element of X will be a date, and the corresponding element of y will be the associated kwh.
您必须做的第一件事是将数据拆分为两个数组 X 和 y。X 的每个元素将是一个日期,而 y 的相应元素将是关联的千瓦时。
Once you have that, you will want to use sklearn.linear_model.LinearRegression to do the regression. The documentation is here.
完成后,您将需要使用 sklearn.linear_model.LinearRegression 进行回归。文档在这里。
As for every sklearn model, there is two step. First you must fit your data. Then, put the dates of which you want to predict the kwh in another array, X_predict, and predict the kwh using the predict method.
对于每个 sklearn 模型,有两个步骤。首先,您必须拟合您的数据。然后,将要预测千瓦时的日期放入另一个数组 X_predict 中,并使用 predict 方法预测千瓦时。
from sklearn.linear_model import LinearRegression
X = [] # put your dates in here
y = [] # put your kwh in here
model = LinearRegression()
model.fit(X, y)
X_predict = [] # put the dates of which you want to predict kwh here
y_predict = model.predict(X_predict)
回答by mrg
Predict() function takes 2 dimensional array as arguments. So, If u want to predict the value for simple linear regression, then you have to issue the prediction value within 2 dimentional array like,
Predict() 函数以二维数组作为参数。所以,如果你想预测简单线性回归的值,那么你必须在二维数组中发布预测值,如,
model.predict([[2012-04-13 05:55:30]]);
model.predict([[2012-04-13 05:55:30]]);
If it is a multiple linear regression then,
如果是多元线性回归,则
model.predict([[2012-04-13 05:44:50,0.327433]])
model.predict([[2012-04-13 05:44:50,0.327433]])
回答by Ankush Shrivastava
Liner Regression:
线性回归:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
data=pd.read_csv('Salary_Data.csv')
X=data.iloc[:,:-1].values
y=data.iloc[:,1].values
#split dataset in train and testing set
from sklearn.cross_validation import train_test_split
X_train,X_test,Y_train,Y_test=train_test_split(X,y,test_size=10,random_state=0)
from sklearn.linear_model import LinearRegression
regressor=LinearRegression()
regressor.fit(X_train,Y_train)
y_pre=regressor.predict(X_test)
回答by wins999
You can have a look at my code on Github where I am predicting temperature using the chirps of an insect cricket with Simple Linear Regression Model. I have explained the code with comments
您可以查看我在 Github 上的代码,我在其中使用具有简单线性回归模型的昆虫蟋蟀的啁啾来预测温度。我已经用注释解释了代码
#Import the libraries required
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
#Importing the excel data
dataset = pd.read_excel('D:\MachineLearing\Machine Learning A-Z Template Folder\Part 2 - Regression\Section 4 - Simple Linear Regression\CricketChirpsVs.Temperature.xls')
x = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 1].values
#Split the data into train and test dataset
from sklearn.cross_validation import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=1/3,random_state=42)
#Fitting Simple Linear regression data model to train data set
from sklearn.linear_model import LinearRegression
regressorObject=LinearRegression()
regressorObject.fit(x_train,y_train)
#predict the test set
y_pred_test_data=regressorObject.predict(x_test)
# Visualising the Training set results in a scatter plot
plt.scatter(x_train, y_train, color = 'red')
plt.plot(x_train, regressorObject.predict(x_train), color = 'blue')
plt.title('Cricket Chirps vs Temperature (Training set)')
plt.xlabel('Cricket Chirps (chirps/sec for the striped ground cricket) ')
plt.ylabel('Temperature (in degrees Fahrenheit)')
plt.show()
# Visualising the test set results in a scatter plot
plt.scatter(x_test, y_test, color = 'red')
plt.plot(x_train, regressorObject.predict(x_train), color = 'blue')
plt.title('Cricket Chirps vs Temperature (Test set)')
plt.xlabel('Cricket Chirps (chirps/sec for the striped ground cricket) ')
plt.ylabel('Temperature (in degrees Fahrenheit)')
plt.show()
For more information please visit
欲了解更多信息,请访问
https://github.com/wins999/Cricket_Chirps_Vs_Temprature--Simple-Linear-Regression-in-Python-
https://github.com/wins999/Cricket_Chirps_Vs_Temprature--Simple-Linear-Regression-in-Python-
回答by Sarang Narkhede
You should implement following code.
您应该实现以下代码。
import pandas as pd
from sklearn.linear_model import LinearRegression # to build linear regression model
from sklearn.cross_validation import train_test_split # to split dataset
data2 = pd.DataFrame(data1['kwh'])
data2 = data2.reset_index() # will create new index (0 to 65700) so date column wont be an index now.
X = data2.iloc[:,0] # date column
y = data2.iloc[:,-1] # kwh column
Xtrain, Xtest, ytrain, ytest = train_test_split(X, y, train_size=0.80, random_state=20)
linearModel = LinearRegression()
linearModel.fit(Xtrain, ytrain)
ypred = model.predict(Xtest)
here ypred will give you probabilities.
这里 ypred 会给你概率。