pandas 使用 matplotlib 绘制 sklearn LinearRegression 输出

Question

提问by Mayank Raj

After importing the file when I separate the x_values and y_values using numpy as:

当我使用 numpy 将 x_values 和 y_values 分开时导入文件后：

import pandas as pd
from sklearn import linear_model
from  matplotlib import pyplot 
import numpy as np

#read data
dataframe = pd.read_csv('challenge_dataset.txt')
dataframe.columns=['Brain','Body']
x_values=np.array(dataframe['Brain'],dtype=np.float64).reshape(1,-1)
y_values=np.array(dataframe['Body'],dtype=np.float64).reshape(1,-1)

#train model on data
body_reg = linear_model.LinearRegression()
body_reg.fit(x_values, y_values)
prediction=body_reg.predict(x_values)

print(prediction)
#visualize results
pyplot.scatter(x_values, y_values)
pyplot.plot(x_values,prediction)
pyplot.show()

I get the plot as following image, which doesn't show up the line of best fit and also when I print the value of 'prediction' it shows up values same as 'y_values'.

我得到的图如下图所示，它没有显示最佳拟合线，而且当我打印“预测”的值时，它显示的值与“y_values”相同。

Contrary when I use the following code. I get the regression line.

相反，当我使用以下代码时。我得到回归线。

#read data
dataframe = pd.read_csv('challenge_dataset.txt')
dataframe.columns=['Brain','Body']
x_values=dataframe[['Brain']]
y_values=dataframe[['Body']]

Why is it so ?

为什么会这样？

Thanks in advance.

提前致谢。

Answer 1

回答by ImportanceOfBeingErnest

linear_model.LinearRegression().fit(X,y)expects its arguments

linear_model.LinearRegression().fit(X,y)期待它的论点

X: numpy array or sparse matrix of shape [n_samples,n_features]
y: numpy array of shape [n_samples, n_targets]

X: 形状的 numpy 数组或稀疏矩阵 : 形状的[n_samples,n_features]
ynumpy 数组[n_samples, n_targets]

Here you have 1 "feature" and 1 "target", hence the expected shape of the input would be (n_samples,1)

这里有 1 个“特征”和 1 个“目标”，因此输入的预期形状是 (n_samples,1)

While this is the case for

虽然这是这种情况

x_values=dataframe[['Brain']]
y_values=dataframe[['Body']]

the shape for np.array(dataframe['Brain'],dtype=np.float64).reshape(1,-1)is (n_samples,).

的形状np.array(dataframe['Brain'],dtype=np.float64).reshape(1,-1)是(n_samples,)。

Another option to optain the desired shape from the dataframe columns would be to broadcast them to a 2D array with a new axis

从数据帧列中选择所需形状的另一种选择是将它们广播到具有新轴的二维数组

x_values=dataframe['Brain'].values[:,np.newaxis]
y_values=dataframe['Body'].values[:,np.newaxis]

Note that in order to show a nice line, you would probably want to sort the x values.

请注意，为了显示漂亮的线条，您可能希望对 x 值进行排序。

import pandas as pd
from sklearn import linear_model
from  matplotlib import pyplot 
import numpy as np

#read data
x = np.random.rand(25,2)
x[:,1] = 2*x[:,0]+np.random.rand(25)
dataframe = pd.DataFrame(x,columns=['Brain','Body'])


x_values=dataframe['Brain'].values[:,np.newaxis]
y_values=dataframe['Body'].values[:,np.newaxis]

body_reg = linear_model.LinearRegression()
body_reg.fit(x_values, y_values)
prediction=body_reg.predict(np.sort(x_values, axis=0))

pyplot.scatter(x_values, y_values)
pyplot.plot(np.sort(x_values, axis=0),prediction)
pyplot.show()

pandas 使用 matplotlib 绘制 sklearn LinearRegression 输出

提问by Mayank Raj

回答by ImportanceOfBeingErnest

相关推荐

最近更新

标签

pandas 使用 matplotlib 绘制 sklearn LinearRegression 输出

提问by Mayank Raj

回答by ImportanceOfBeingErnest

相关推荐

pandas 如何从 Python 数据框列中的字符串中删除非字母数字字符？

pandas pandas_datareader 在 jupyter-notebook (Anaconda) 中不起作用

Pandas 数据框 - 删除异常值

pandas 类型错误：float() 参数必须是字符串或数字，而不是“函数”——Python/Sklearn

相关推荐

最近更新

标签