Python scikit-learn：如何缩小“y”预测结果

Question

提问by Hookstark

I'm trying to learn scikit-learnand Machine Learning by using the Boston Housing Data Set.

我正在尝试scikit-learn使用波士顿住房数据集进行学习和机器学习。

# I splitted the initial dataset ('housing_X' and 'housing_y')
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(housing_X, housing_y, test_size=0.25, random_state=33)

# I scaled those two datasets
from sklearn.preprocessing import StandardScaler
scalerX = StandardScaler().fit(X_train)
scalery = StandardScaler().fit(y_train)
X_train = scalerX.transform(X_train)
y_train = scalery.transform(y_train)
X_test = scalerX.transform(X_test)
y_test = scalery.transform(y_test)

# I created the model
from sklearn import linear_model
clf_sgd = linear_model.SGDRegressor(loss='squared_loss', penalty=None, random_state=42) 
train_and_evaluate(clf_sgd,X_train,y_train)

Based on this new model clf_sgd, I am trying to predict the ybased on the first instance of X_train.

基于这个新模型clf_sgd，我试图y根据X_train.

X_new_scaled = X_train[0]
print (X_new_scaled)
y_new = clf_sgd.predict(X_new_scaled)
print (y_new)

However, the result is quite odd for me (1.34032174, instead of 20-30, the range of the price of the houses)

然而，结果对我来说很奇怪（1.34032174而不是20-30房屋价格的范围）

[-0.32076092  0.35553428 -1.00966618 -0.28784917  0.87716097  1.28834383
  0.4759489  -0.83034371 -0.47659648 -0.81061061 -2.49222645  0.35062335
 -0.39859013]
[ 1.34032174]

I guess that this 1.34032174value should be scaled back, but I am trying to figure out how to do it with no success. Any tip is welcome. Thank you very much.

我想这个1.34032174值应该缩小，但我试图弄清楚如何做到这一点但没有成功。欢迎任何提示。非常感谢。

Answer 1

回答by Ryan

You can use inverse_transformusing your scaleryobject:

您可以使用inverse_transform使用scalery对象：

y_new_inverse = scalery.inverse_transform(y_new)

Answer 2

回答by Maartenk

Bit late to the game: Just don't scale your y. With scaling y you actually loose your units. The regression or loss optimization is actually determined by the relative differences between the features. BTW for house prices (or any other monetary value) it is common practice to take the logarithm. Then you obviously need to do an numpy.exp() to get back to the actual dollars/euros/yens...

游戏有点晚了：只是不要缩放你的 y。通过缩放 y，您实际上会失去您的单位。回归或损失优化实际上是由特征之间的相对差异决定的。顺便说一句，对于房价（或任何其他货币价值），通常的做法是取对数。然后你显然需要做一个 numpy.exp() 来回到实际的美元/欧元/日元......

Python scikit-learn：如何缩小“y”预测结果

提问by Hookstark

回答by Ryan

回答by Maartenk

相关推荐

最近更新

标签

Python scikit-learn：如何缩小“y”预测结果

提问by Hookstark

回答by Ryan

回答by Maartenk

相关推荐

Python 如何将 .wav 文件拆分为多个 .wav 文件？

Python conda：从基础/根环境中删除所有已安装的软件包

Python 代码中的“无效类型比较”

Python 在 jupyter notebook 上导入 OpenCV

相关推荐

最近更新

标签