pandas 使用 matplotlib 绘制 sklearn LinearRegression 输出
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/46382550/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Plot sklearn LinearRegression output with matplotlib
提问by Mayank Raj
After importing the file when I separate the x_values and y_values using numpy as:
当我使用 numpy 将 x_values 和 y_values 分开时导入文件后:
import pandas as pd
from sklearn import linear_model
from matplotlib import pyplot
import numpy as np
#read data
dataframe = pd.read_csv('challenge_dataset.txt')
dataframe.columns=['Brain','Body']
x_values=np.array(dataframe['Brain'],dtype=np.float64).reshape(1,-1)
y_values=np.array(dataframe['Body'],dtype=np.float64).reshape(1,-1)
#train model on data
body_reg = linear_model.LinearRegression()
body_reg.fit(x_values, y_values)
prediction=body_reg.predict(x_values)
print(prediction)
#visualize results
pyplot.scatter(x_values, y_values)
pyplot.plot(x_values,prediction)
pyplot.show()
I get the plot as following image, which doesn't show up the line of best fit and also when I print the value of 'prediction' it shows up values same as 'y_values'.
我得到的图如下图所示,它没有显示最佳拟合线,而且当我打印“预测”的值时,它显示的值与“y_values”相同。
Contrary when I use the following code. I get the regression line.
#read data
dataframe = pd.read_csv('challenge_dataset.txt')
dataframe.columns=['Brain','Body']
x_values=dataframe[['Brain']]
y_values=dataframe[['Body']]
Why is it so ?
为什么会这样?
Thanks in advance.
提前致谢。
回答by ImportanceOfBeingErnest
linear_model.LinearRegression().fit(X,y)
expects its arguments
linear_model.LinearRegression().fit(X,y)
期待它的论点
X
: numpy array or sparse matrix of shape[n_samples,n_features]
y
: numpy array of shape[n_samples, n_targets]
X
: 形状的 numpy 数组或稀疏矩阵 : 形状的[n_samples,n_features]
y
numpy 数组[n_samples, n_targets]
Here you have 1 "feature" and 1 "target", hence the expected shape of the input would be (n_samples,1)
这里有 1 个“特征”和 1 个“目标”,因此输入的预期形状是 (n_samples,1)
While this is the case for
虽然这是这种情况
x_values=dataframe[['Brain']]
y_values=dataframe[['Body']]
the shape for np.array(dataframe['Brain'],dtype=np.float64).reshape(1,-1)
is (n_samples,)
.
的形状np.array(dataframe['Brain'],dtype=np.float64).reshape(1,-1)
是(n_samples,)
。
Another option to optain the desired shape from the dataframe columns would be to broadcast them to a 2D array with a new axis
从数据帧列中选择所需形状的另一种选择是将它们广播到具有新轴的二维数组
x_values=dataframe['Brain'].values[:,np.newaxis]
y_values=dataframe['Body'].values[:,np.newaxis]
Note that in order to show a nice line, you would probably want to sort the x values.
请注意,为了显示漂亮的线条,您可能希望对 x 值进行排序。
import pandas as pd
from sklearn import linear_model
from matplotlib import pyplot
import numpy as np
#read data
x = np.random.rand(25,2)
x[:,1] = 2*x[:,0]+np.random.rand(25)
dataframe = pd.DataFrame(x,columns=['Brain','Body'])
x_values=dataframe['Brain'].values[:,np.newaxis]
y_values=dataframe['Body'].values[:,np.newaxis]
body_reg = linear_model.LinearRegression()
body_reg.fit(x_values, y_values)
prediction=body_reg.predict(np.sort(x_values, axis=0))
pyplot.scatter(x_values, y_values)
pyplot.plot(np.sort(x_values, axis=0),prediction)
pyplot.show()