Python sklearn的score函数的参数是什么?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/24458163/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What are the parameters for sklearn's score function?
提问by tooty44
I recently looked at a bunch of sklearn tutorials, which were all similar in that they scored the goodness of fit by:
我最近查看了一堆 sklearn 教程,它们都很相似,因为它们通过以下方式对拟合优度进行了评分:
clf.fit(X_train, y_train)
clf.score(X_test, y_test)
And it'll spit out:
它会吐出:
0.92345...
or some other score.
或其他分数。
I am curious as to the parameters of the clf.score function or how it scores the model. I looked all over the internet, but can't seem to find documentation for it. Does anyone know?
我很好奇 clf.score 函数的参数或它如何为模型评分。我查看了整个互联网,但似乎无法找到它的文档。有人知道吗?
采纳答案by Fred Foo
It takes a feature matrix X_test
and the expected target values y_test
. Predictions for X_test
are compared with y_test
and either accuracy (for classifiers) or R2 score (for regression estimators is returned.
它需要一个特征矩阵X_test
和预期的目标值y_test
。将 的预测X_test
与y_test
准确度(对于分类器)或 R2 分数(对于回归估计器)进行比较。
This is stated very explicitly in the docstrings for score
methods. The one for classification reads
这在score
方法的文档字符串中非常明确地说明。一种用于分类读取
Returns the mean accuracy on the given test data and labels.
Parameters
----------
X : array-like, shape = (n_samples, n_features)
Test samples.
y : array-like, shape = (n_samples,)
True labels for X.
sample_weight : array-like, shape = [n_samples], optional
Sample weights.
Returns
-------
score : float
Mean accuracy of self.predict(X) wrt. y.
and the one for regression is similar.
和回归是相似的。
回答by newtover
Not sure that I understood your question correctly. Obviously, to compute some error or similarity most scoring functions receive an array of reference values (y_true
) and an array of values predicted by your model (y_score
) as main parameters, but may also receive some other parameters, specific for the metric. Scoring functions usually do not need X values.
不确定我是否正确理解了您的问题。显然,为了计算一些错误或相似性,大多数评分函数接收一组参考值 ( y_true
) 和一组模型预测的值 ( y_score
) 作为主要参数,但也可能接收一些其他参数,特定于指标。评分函数通常不需要 X 值。
I would suggest look into the source code of the scoring functions to understand how they work.
我建议查看评分函数的源代码以了解它们的工作原理。
Here is a list of scoring functions in scikit-learn.
这是scikit-learn中的评分函数列表。
回答by Salvador Dali
This is classifier dependent. Each classifier provides it's own scoring function.
这是分类器相关的。每个分类器都提供自己的评分函数。
Estimator score method: Estimators have a score method providing a default evaluation criterion for the problem they are designed to solve. This is not discussed on this page, but in each estimator's documentation.
估计器评分方法:估计器有一个评分方法,为他们设计要解决的问题提供默认的评估标准。本页未讨论这一点,而是在每个估算器的文档中讨论。
Apart from the documentation given to you in one of the answers, the only additional thing you can do is to read what kind of parameters your estimator provides. For example SVM classifier SVC has the following parameters score(X, y, sample_weight=None)
除了在其中一个答案中提供给您的文档之外,您唯一可以做的其他事情就是阅读您的估算器提供的参数类型。例如 SVM 分类器 SVC 有以下参数score(X, y, sample_weight=None)
回答by Hammad Basit
Syntax:sklearn.metrics.accuracy_score(y_true, y_pred, normalize=True, sample_weight=None)
语法:sklearn.metrics.accuracy_score(y_true, y_pred, normalize=True, sample_weight=None)
In multilabel classification, this function computes subset accuracy: the set of labels predicted for a sample must exactly match the corresponding set of labels in y_true.
在多标签分类中,此函数计算子集精度:为样本预测的标签集必须与 y_true 中的相应标签集完全匹配。
Parameters:y_true :1d array-like, or label indicator array / sparse matrix Ground truth (correct) labels.
参数:y_true :一维数组,或标签指示数组/稀疏矩阵地面实况(正确)标签。
y_pred:1d array-like, or label indicator array / sparse matrix Predicted labels, as returned by a classifier.
y_pred:1d array-like, or label indicator array / sparse matrix 预测标签,由分类器返回。
normalize :bool, optional (default=True) If False, return the number of correctly classified samples. Otherwise, return the fraction of correctly classified samples.
normalize :bool, optional (default=True) 如果为 False,则返回正确分类的样本数。否则,返回正确分类样本的分数。
sample_weight :array-like of shape = [n_samples], optional Sample weights.
sample_weight :类似数组的形状 = [n_samples],可选的样本权重。
Returns:
score :float
If normalize == True, return the fraction of correctly classified samples (float), else returns the number of correctly classified samples (int).
返回:
score :float 如果 normalize == True,则返回正确分类样本的分数(float),否则返回正确分类样本的数量(int)。
The best performance is 1 with normalize == True and the number of samples with normalize == False.
最佳性能是 normalize == True 的 1 和 normalize == False 的样本数。
For more information you can refer to:[https://scikit-learn.org/stable/modules/model_evaluation.html#accuracy-score][1]
更多信息可以参考:[ https://scikit-learn.org/stable/modules/model_evaluation.html#accuracy-score][1]
回答by Eli Safra
Here is the way the score is calculated for Regressor:
以下是为 Regressor 计算分数的方式:
score(self, X, y, sample_weight=None)[source] Returns the coefficient of determination R^2 of the prediction.
score(self, X, y, sample_weight=None)[source] 返回预测的决定系数 R^2。
The coefficient R^2 is defined as (1 - u/v), where u is the residual sum of squares ((ytrue - ypred) ** 2).sum() and v is the total sum of squares ((ytrue - ytrue.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a R^2 score of 0.0.
系数 R^2 定义为 (1 - u/v),其中 u 是残差平方和 ((ytrue - ypred) ** 2).sum(),v 是总平方和 ((ytrue - ytrue.mean()) ** 2).sum()。最好的可能分数是 1.0,它可以是负数(因为模型可以任意糟糕)。一个始终预测 y 的预期值的常数模型,忽略输入特征,将获得 0.0 的 R^2 分数。
From sklearn documentation.
来自 sklearn 文档。