Python 如何在 scikit-learn 中创建/自定义您自己的评分器功能？

Question

提问by daniel2014

I am using Support Vector Regressionas an estimator in GridSearchCV. But I want to change the error function: instead of using the default (R-squared: coefficient of determination), I would like to define my own custom error function.

我在GridSearchCV 中使用支持向量回归作为估计器。但我想更改误差函数：我想定义自己的自定义误差函数，而不是使用默认值（R 平方：确定系数）。

I tried to make one with make_scorer, but it didn't work.

我试图用制作一个make_scorer，但没有用。

I read the documentation and found that it's possible to create custom estimators, but I don't need to remake the entire estimator - only the error/scoring function.

我阅读了文档，发现可以创建自定义 estimators，但我不需要重新制作整个 estimator - 只需要重新制作错误/评分函数。

I think I can do it by defining a callable as a scorer, like it says in the docs.

我想我可以通过将 callable 定义为 scorer 来做到这一点，就像文档中所说的那样。

But I don't know how to use an estimator: in my case SVR. Would I have to switch to a classifier (such as SVC)? And how would I use it?

但我不知道如何使用估算器：在我的情况下是 SVR。我是否必须切换到分类器（例如 SVC）？我将如何使用它？

My custom error function is as follows:

我的自定义错误函数如下：

def my_custom_loss_func(X_train_scaled, Y_train_scaled):
    error, M = 0, 0
    for i in range(0, len(Y_train_scaled)):
        z = (Y_train_scaled[i] - M)
        if X_train_scaled[i] > M and Y_train_scaled[i] > M and (X_train_scaled[i] - Y_train_scaled[i]) > 0:
            error_i = (abs(Y_train_scaled[i] - X_train_scaled[i]))**(2*np.exp(z))
        if X_train_scaled[i] > M and Y_train_scaled[i] > M and (X_train_scaled[i] - Y_train_scaled[i]) < 0:
            error_i = -(abs((Y_train_scaled[i] - X_train_scaled[i]))**(2*np.exp(z)))
        if X_train_scaled[i] > M and Y_train_scaled[i] < M:
            error_i = -(abs(Y_train_scaled[i] - X_train_scaled[i]))**(2*np.exp(-z))
    error += error_i
    return error

The variable Misn't null/zero. I've just set it to zero for simplicity.

变量M不是空/零。为简单起见，我只是将其设置为零。

Would anyone be able to show an example application of this custom scoring function? Thanks for your help!

任何人都可以展示此自定义评分功能的示例应用程序吗？谢谢你的帮助！

Answer 1

采纳答案by Jamie Bull

As you saw, this is done by using make_scorer(docs).

如您所见，这是通过使用make_scorer( docs) 完成的。

from sklearn.grid_search import GridSearchCV
from sklearn.metrics import make_scorer
from sklearn.svm import SVR

import numpy as np

rng = np.random.RandomState(1)

def my_custom_loss_func(X_train_scaled, Y_train_scaled):
    error, M = 0, 0
    for i in range(0, len(Y_train_scaled)):
        z = (Y_train_scaled[i] - M)
        if X_train_scaled[i] > M and Y_train_scaled[i] > M and (X_train_scaled[i] - Y_train_scaled[i]) > 0:
            error_i = (abs(Y_train_scaled[i] - X_train_scaled[i]))**(2*np.exp(z))
        if X_train_scaled[i] > M and Y_train_scaled[i] > M and (X_train_scaled[i] - Y_train_scaled[i]) < 0:
            error_i = -(abs((Y_train_scaled[i] - X_train_scaled[i]))**(2*np.exp(z)))
        if X_train_scaled[i] > M and Y_train_scaled[i] < M:
            error_i = -(abs(Y_train_scaled[i] - X_train_scaled[i]))**(2*np.exp(-z))
    error += error_i
    return error

# Generate sample data
X = 5 * rng.rand(10000, 1)
y = np.sin(X).ravel()

# Add noise to targets
y[::5] += 3 * (0.5 - rng.rand(X.shape[0]/5))

train_size = 100

my_scorer = make_scorer(my_custom_loss_func, greater_is_better=True)

svr = GridSearchCV(SVR(kernel='rbf', gamma=0.1),
                   scoring=my_scorer,
                   cv=5,
                   param_grid={"C": [1e0, 1e1, 1e2, 1e3],
                               "gamma": np.logspace(-2, 2, 5)})

svr.fit(X[:train_size], y[:train_size])

print svr.best_params_
print svr.score(X[train_size:], y[train_size:])

Answer 2

回答by alichaudry

Jamie has a fleshed out example, but here's an example using make_scorer straight from scikit-learn documentation:

Jamie 有一个充实的例子，但这里有一个直接从 scikit-learn文档中使用 make_scorer 的例子：

import numpy as np
def my_custom_loss_func(ground_truth, predictions):
    diff = np.abs(ground_truth - predictions).max()
    return np.log(1 + diff)

# loss_func will negate the return value of my_custom_loss_func,
#  which will be np.log(2), 0.693, given the values for ground_truth
#  and predictions defined below.
loss  = make_scorer(my_custom_loss_func, greater_is_better=False)
score = make_scorer(my_custom_loss_func, greater_is_better=True)
ground_truth = [[1, 1]]
predictions  = [0, 1]
from sklearn.dummy import DummyClassifier
clf = DummyClassifier(strategy='most_frequent', random_state=0)
clf = clf.fit(ground_truth, predictions)
loss(clf,ground_truth, predictions) 

score(clf,ground_truth, predictions)

When defining a custom scorer via sklearn.metrics.make_scorer, the convention is that custom functions ending in _scorereturn a value to maximize. And for scorers ending in _lossor _error, a value is returned to be minimized. You can use this functionality by setting the greater_is_betterparameter inside make_scorer. That is, this parameter would be Truefor scorers where higher values are better, and Falsefor scorers where lower values are better. GridSearchCVcan then optimize in the appropriate direction.

通过定义自定义记分器时sklearn.metrics.make_scorer，惯例是自定义函数以_score返回一个值来最大化。对于以_lossor结尾的记分员_error，返回一个值以使其最小化。你可以通过在greater_is_better里面设置参数来使用这个功能make_scorer。也就是说，此参数True适用于值越高越好False的记分员，以及值越低越好的记分员。GridSearchCV然后可以在适当的方向进行优化。

You can then convert your function as a scorer as follows:

然后，您可以将您的函数转换为记分员，如下所示：

from sklearn.metrics.scorer import make_scorer

def custom_loss_func(X_train_scaled, Y_train_scaled):
    error, M = 0, 0
    for i in range(0, len(Y_train_scaled)):
        z = (Y_train_scaled[i] - M)
        if X_train_scaled[i] > M and Y_train_scaled[i] > M and (X_train_scaled[i] - Y_train_scaled[i]) > 0:
            error_i = (abs(Y_train_scaled[i] - X_train_scaled[i]))**(2*np.exp(z))
        if X_train_scaled[i] > M and Y_train_scaled[i] > M and (X_train_scaled[i] - Y_train_scaled[i]) < 0:
            error_i = -(abs((Y_train_scaled[i] - X_train_scaled[i]))**(2*np.exp(z)))
        if X_train_scaled[i] > M and Y_train_scaled[i] < M:
            error_i = -(abs(Y_train_scaled[i] - X_train_scaled[i]))**(2*np.exp(-z))
    error += error_i
    return error


custom_scorer = make_scorer(custom_loss_func, greater_is_better=True)

And then pass custom_scorerinto GridSearchCVas you would any other scoring function: clf = GridSearchCV(scoring=custom_scorer).

再通custom_scorer入GridSearchCV，就像任何其他的计分函数：clf = GridSearchCV(scoring=custom_scorer)。

Python 如何在 scikit-learn 中创建/自定义您自己的评分器功能？

提问by daniel2014

采纳答案by Jamie Bull

回答by alichaudry

相关推荐

最近更新

标签

Python 如何在 scikit-learn 中创建/自定义您自己的评分器功能？

提问by daniel2014

采纳答案by Jamie Bull

回答by alichaudry

相关推荐

Python 类型错误：“类型”对象不可迭代 - 迭代对象实例

Python 如何在 Django 中正确使用“选择”字段选项

Python PyQt4 - 创建一个计时器

在 python/pandas 中按月对每日数据进行分组，然后进行标准化

相关推荐

最近更新

标签