Python 如何在 scikit-learn 中创建/自定义您自己的评分器功能?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/32401493/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to create/customize your own scorer function in scikit-learn?
提问by daniel2014
I am using Support Vector Regressionas an estimator in GridSearchCV. But I want to change the error function: instead of using the default (R-squared: coefficient of determination), I would like to define my own custom error function.
我在GridSearchCV 中使用支持向量回归作为估计器。但我想更改误差函数:我想定义自己的自定义误差函数,而不是使用默认值(R 平方:确定系数)。
I tried to make one with make_scorer
, but it didn't work.
我试图用 制作一个make_scorer
,但没有用。
I read the documentation and found that it's possible to create custom estimators, but I don't need to remake the entire estimator - only the error/scoring function.
我阅读了文档,发现可以创建自定义 estimators,但我不需要重新制作整个 estimator - 只需要重新制作错误/评分函数。
I think I can do it by defining a callable as a scorer, like it says in the docs.
我想我可以通过将 callable 定义为 scorer 来做到这一点,就像文档中所说的那样。
But I don't know how to use an estimator: in my case SVR. Would I have to switch to a classifier (such as SVC)? And how would I use it?
但我不知道如何使用估算器:在我的情况下是 SVR。我是否必须切换到分类器(例如 SVC)?我将如何使用它?
My custom error function is as follows:
我的自定义错误函数如下:
def my_custom_loss_func(X_train_scaled, Y_train_scaled):
error, M = 0, 0
for i in range(0, len(Y_train_scaled)):
z = (Y_train_scaled[i] - M)
if X_train_scaled[i] > M and Y_train_scaled[i] > M and (X_train_scaled[i] - Y_train_scaled[i]) > 0:
error_i = (abs(Y_train_scaled[i] - X_train_scaled[i]))**(2*np.exp(z))
if X_train_scaled[i] > M and Y_train_scaled[i] > M and (X_train_scaled[i] - Y_train_scaled[i]) < 0:
error_i = -(abs((Y_train_scaled[i] - X_train_scaled[i]))**(2*np.exp(z)))
if X_train_scaled[i] > M and Y_train_scaled[i] < M:
error_i = -(abs(Y_train_scaled[i] - X_train_scaled[i]))**(2*np.exp(-z))
error += error_i
return error
The variable M
isn't null/zero. I've just set it to zero for simplicity.
变量M
不是空/零。为简单起见,我只是将其设置为零。
Would anyone be able to show an example application of this custom scoring function? Thanks for your help!
任何人都可以展示此自定义评分功能的示例应用程序吗?谢谢你的帮助!
采纳答案by Jamie Bull
As you saw, this is done by using make_scorer
(docs).
如您所见,这是通过使用make_scorer
( docs) 完成的。
from sklearn.grid_search import GridSearchCV
from sklearn.metrics import make_scorer
from sklearn.svm import SVR
import numpy as np
rng = np.random.RandomState(1)
def my_custom_loss_func(X_train_scaled, Y_train_scaled):
error, M = 0, 0
for i in range(0, len(Y_train_scaled)):
z = (Y_train_scaled[i] - M)
if X_train_scaled[i] > M and Y_train_scaled[i] > M and (X_train_scaled[i] - Y_train_scaled[i]) > 0:
error_i = (abs(Y_train_scaled[i] - X_train_scaled[i]))**(2*np.exp(z))
if X_train_scaled[i] > M and Y_train_scaled[i] > M and (X_train_scaled[i] - Y_train_scaled[i]) < 0:
error_i = -(abs((Y_train_scaled[i] - X_train_scaled[i]))**(2*np.exp(z)))
if X_train_scaled[i] > M and Y_train_scaled[i] < M:
error_i = -(abs(Y_train_scaled[i] - X_train_scaled[i]))**(2*np.exp(-z))
error += error_i
return error
# Generate sample data
X = 5 * rng.rand(10000, 1)
y = np.sin(X).ravel()
# Add noise to targets
y[::5] += 3 * (0.5 - rng.rand(X.shape[0]/5))
train_size = 100
my_scorer = make_scorer(my_custom_loss_func, greater_is_better=True)
svr = GridSearchCV(SVR(kernel='rbf', gamma=0.1),
scoring=my_scorer,
cv=5,
param_grid={"C": [1e0, 1e1, 1e2, 1e3],
"gamma": np.logspace(-2, 2, 5)})
svr.fit(X[:train_size], y[:train_size])
print svr.best_params_
print svr.score(X[train_size:], y[train_size:])
回答by alichaudry
Jamie has a fleshed out example, but here's an example using make_scorer straight from scikit-learn documentation:
Jamie 有一个充实的例子,但这里有一个直接从 scikit-learn文档中使用 make_scorer 的例子:
import numpy as np
def my_custom_loss_func(ground_truth, predictions):
diff = np.abs(ground_truth - predictions).max()
return np.log(1 + diff)
# loss_func will negate the return value of my_custom_loss_func,
# which will be np.log(2), 0.693, given the values for ground_truth
# and predictions defined below.
loss = make_scorer(my_custom_loss_func, greater_is_better=False)
score = make_scorer(my_custom_loss_func, greater_is_better=True)
ground_truth = [[1, 1]]
predictions = [0, 1]
from sklearn.dummy import DummyClassifier
clf = DummyClassifier(strategy='most_frequent', random_state=0)
clf = clf.fit(ground_truth, predictions)
loss(clf,ground_truth, predictions)
score(clf,ground_truth, predictions)
When defining a custom scorer via sklearn.metrics.make_scorer
, the convention is that custom functions ending in _score
return a value to maximize. And for scorers ending in _loss
or _error
, a value is returned to be minimized. You can use this functionality by setting the greater_is_better
parameter inside make_scorer
. That is, this parameter would be True
for scorers where higher values are better, and False
for scorers where lower values are better. GridSearchCV
can then optimize in the appropriate direction.
通过 定义自定义记分器时sklearn.metrics.make_scorer
,惯例是自定义函数以_score
返回一个值来最大化。对于以_loss
or结尾的记分员_error
,返回一个值以使其最小化。你可以通过在greater_is_better
里面设置参数来使用这个功能make_scorer
。也就是说,此参数True
适用于值越高越好False
的记分员,以及值越低越好的记分员。GridSearchCV
然后可以在适当的方向进行优化。
You can then convert your function as a scorer as follows:
然后,您可以将您的函数转换为记分员,如下所示:
from sklearn.metrics.scorer import make_scorer
def custom_loss_func(X_train_scaled, Y_train_scaled):
error, M = 0, 0
for i in range(0, len(Y_train_scaled)):
z = (Y_train_scaled[i] - M)
if X_train_scaled[i] > M and Y_train_scaled[i] > M and (X_train_scaled[i] - Y_train_scaled[i]) > 0:
error_i = (abs(Y_train_scaled[i] - X_train_scaled[i]))**(2*np.exp(z))
if X_train_scaled[i] > M and Y_train_scaled[i] > M and (X_train_scaled[i] - Y_train_scaled[i]) < 0:
error_i = -(abs((Y_train_scaled[i] - X_train_scaled[i]))**(2*np.exp(z)))
if X_train_scaled[i] > M and Y_train_scaled[i] < M:
error_i = -(abs(Y_train_scaled[i] - X_train_scaled[i]))**(2*np.exp(-z))
error += error_i
return error
custom_scorer = make_scorer(custom_loss_func, greater_is_better=True)
And then pass custom_scorer
into GridSearchCV
as you would any other scoring function: clf = GridSearchCV(scoring=custom_scorer)
.
再通custom_scorer
入GridSearchCV
,就像任何其他的计分函数:clf = GridSearchCV(scoring=custom_scorer)
。