pandas GridSearchCV.best_score_ 评分设置为“准确度”和 CV 时的含义
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/44459845/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
GridSearchCV.best_score_ meaning when scoring set to 'accuracy' and CV
提问by Taka
I'm trying to find the best model Neural Network model applied for the classification of breast cancer samples on the well-known Wisconsin Cancer dataset (569 samples, 31 features + target). I'm using sklearn 0.18.1. I'm not using Normalization so far. I'll add it when I solve this question.
我试图在著名的威斯康星癌症数据集(569 个样本,31 个特征 + 目标)上找到应用于乳腺癌样本分类的最佳模型神经网络模型。我正在使用 sklearn 0.18.1。到目前为止,我没有使用标准化。当我解决这个问题时,我会添加它。
# some init code omitted
X_train, X_test, y_train, y_test = train_test_split(X, y)
Define params NN params for the GridSearchCV
为 GridSearchCV 定义参数 NN 参数
tuned_params = [{'solver': ['sgd'], 'learning_rate': ['constant'], "learning_rate_init" : [0.001, 0.01, 0.05, 0.1]},
{"learning_rate_init" : [0.001, 0.01, 0.05, 0.1]}]
CV method and model
CV方法和模型
cv_method = KFold(n_splits=4, shuffle=True)
model = MLPClassifier()
Apply grid
应用网格
grid = GridSearchCV(estimator=model, param_grid=tuned_params, cv=cv_method, scoring='accuracy')
grid.fit(X_train, y_train)
y_pred = grid.predict(X_test)
And if I run:
如果我跑:
print(grid.best_score_)
print(accuracy_score(y_test, y_pred))
The result is 0.746478873239and 0.902097902098
结果是0.746478873239和0.902097902098
According to the doc "best_score_ : float, Score of best_estimator on the left out data". I assume it is the best accuracy among the ones obtained running the 8 different configuration as especified in tuned_paramsthe number of times especified by KFold, on the left out data as especified by KFold. Am I right?
根据文档“best_score_ : float, best_estimator 在遗漏数据上的分数”。我认为它是在运行 8 种不同配置中获得的最佳准确度,如tuned_params 中指定的KFold指定的次数,KFold 指定的遗漏数据。我对吗?
One more question. Is there a method to find the optimal size of test data to use in train_test_splitwhich defaults to 0.25?
还有一个问题。有没有一种方法可以找到在train_test_split中使用的测试数据的最佳大小,默认为 0.25?
Thanks a lot
非常感谢
REFERENCES
参考
- http://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html
- http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html#sklearn.model_selection.GridSearchCV
- http://scikit-learn.org/stable/modules/grid_search.html
- http://scikit-learn.org/stable/modules/cross_validation.html
- http://scikit-learn.org/stable/auto_examples/model_selection/plot_nested_cross_validation_iris.html#sphx-glr-auto-examples-model-selection-plot-nested-cross-validation-iris-py
- http://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html
- http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html#sklearn.model_selection.GridSearchCV
- http://scikit-learn.org/stable/modules/grid_search.html
- http://scikit-learn.org/stable/modules/cross_validation.html
- http://scikit-learn.org/stable/auto_examples/model_selection/plot_nested_cross_validation_iris.html#sphx-glr-auto-examples-model-selection-plot-nested-cross-validation-iris-py
回答by Vivek Kumar
The grid.best_score_
is the average of all cv folds for a single combination of the parameters you specify in the tuned_params
.
在grid.best_score_
为您的指定参数的单一组合是平均所有品种的褶皱tuned_params
。
In order to access other relevant details about the grid searching process, you can look at the grid.cv_results_
attribute.
为了访问有关网格搜索过程的其他相关详细信息,您可以查看该grid.cv_results_
属性。
From the documentation of GridSearchCV:
cv_results_ : dict of numpy (masked) ndarrays
A dict with keys as column headers and values as columns, that can be imported into a pandas DataFrame
cv_results_ : numpy (masked) ndarrays 的字典
A dict with keys as column headers and values as columns, that can be imported into a pandas DataFrame
It contains keys like 'split0_test_score', 'split1_test_score' , 'mean_test_score', 'std_test_score', 'rank_test_score', 'split0_train_score', 'split1_train_score', 'mean_train_score', etc, which gives additional information about the whole execution.
它包含诸如“split0_test_score”、“split1_test_score”、“mean_test_score”、“std_test_score”、“rank_test_score”、“split0_train_score”、“split1_train_score”、“mean_train_score”等键,这些键提供了有关整体的附加信息。