Python scikit-learn:如何缩小“y”预测结果
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/38058774/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
scikit-learn: how to scale back the 'y' predicted result
提问by Hookstark
I'm trying to learn scikit-learn
and Machine Learning by using the Boston Housing Data Set.
我正在尝试scikit-learn
使用波士顿住房数据集进行学习和机器学习。
# I splitted the initial dataset ('housing_X' and 'housing_y')
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(housing_X, housing_y, test_size=0.25, random_state=33)
# I scaled those two datasets
from sklearn.preprocessing import StandardScaler
scalerX = StandardScaler().fit(X_train)
scalery = StandardScaler().fit(y_train)
X_train = scalerX.transform(X_train)
y_train = scalery.transform(y_train)
X_test = scalerX.transform(X_test)
y_test = scalery.transform(y_test)
# I created the model
from sklearn import linear_model
clf_sgd = linear_model.SGDRegressor(loss='squared_loss', penalty=None, random_state=42)
train_and_evaluate(clf_sgd,X_train,y_train)
Based on this new model clf_sgd
, I am trying to predict the y
based on the first instance of X_train
.
基于这个新模型clf_sgd
,我试图y
根据X_train
.
X_new_scaled = X_train[0]
print (X_new_scaled)
y_new = clf_sgd.predict(X_new_scaled)
print (y_new)
However, the result is quite odd for me (1.34032174
, instead of 20-30
, the range of the price of the houses)
然而,结果对我来说很奇怪(1.34032174
而不是20-30
房屋价格的范围)
[-0.32076092 0.35553428 -1.00966618 -0.28784917 0.87716097 1.28834383
0.4759489 -0.83034371 -0.47659648 -0.81061061 -2.49222645 0.35062335
-0.39859013]
[ 1.34032174]
I guess that this 1.34032174
value should be scaled back, but I am trying to figure out how to do it with no success. Any tip is welcome. Thank you very much.
我想这个1.34032174
值应该缩小,但我试图弄清楚如何做到这一点但没有成功。欢迎任何提示。非常感谢。
回答by Ryan
You can use inverse_transform
using your scalery
object:
您可以使用inverse_transform
使用scalery
对象:
y_new_inverse = scalery.inverse_transform(y_new)
回答by Maartenk
Bit late to the game: Just don't scale your y. With scaling y you actually loose your units. The regression or loss optimization is actually determined by the relative differences between the features. BTW for house prices (or any other monetary value) it is common practice to take the logarithm. Then you obviously need to do an numpy.exp() to get back to the actual dollars/euros/yens...
游戏有点晚了:只是不要缩放你的 y。通过缩放 y,您实际上会失去您的单位。回归或损失优化实际上是由特征之间的相对差异决定的。顺便说一句,对于房价(或任何其他货币价值),通常的做法是取对数。然后你显然需要做一个 numpy.exp() 来回到实际的美元/欧元/日元......