Python 如何在 GridSearchCV(随机森林分类器 Scikit)上获得最佳估计器

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/30102973/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 07:57:30  来源:igfitidea点击:

How to get Best Estimator on GridSearchCV (Random Forest Classifier Scikit)

pythonscikit-learnrandom-forestcross-validation

提问by sapo_cosmico

I'm running GridSearch CV to optimize the parameters of a classifier in scikit. Once I'm done, I'd like to know which parameters were chosen as the best.

我正在运行 GridSearch CV 来优化 scikit 中分类器的参数。完成后,我想知道哪些参数被选为最佳参数。

Whenever I do so I get a AttributeError: 'RandomForestClassifier' object has no attribute 'best_estimator_', and can't tell why, as it seems to be a legitimate attribute on the documentation.

每当我这样做时,我都会得到一个AttributeError: 'RandomForestClassifier' object has no attribute 'best_estimator_',并且不知道为什么,因为它似乎是文档中的合法属性。

from sklearn.grid_search import GridSearchCV

X = data[usable_columns]
y = data[target]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

rfc = RandomForestClassifier(n_jobs=-1,max_features= 'sqrt' ,n_estimators=50, oob_score = True) 

param_grid = {
    'n_estimators': [200, 700],
    'max_features': ['auto', 'sqrt', 'log2']
}

CV_rfc = GridSearchCV(estimator=rfc, param_grid=param_grid, cv= 5)

print '\n',CV_rfc.best_estimator_

Yields:

产量:

`AttributeError: 'GridSearchCV' object has no attribute 'best_estimator_'

采纳答案by Ryan

You have to fit your data before you can get the best parameter combination.

您必须先拟合数据,然后才能获得最佳参数组合。

from sklearn.grid_search import GridSearchCV
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
# Build a classification task using 3 informative features
X, y = make_classification(n_samples=1000,
                           n_features=10,
                           n_informative=3,
                           n_redundant=0,
                           n_repeated=0,
                           n_classes=2,
                           random_state=0,
                           shuffle=False)


rfc = RandomForestClassifier(n_jobs=-1,max_features= 'sqrt' ,n_estimators=50, oob_score = True) 

param_grid = { 
    'n_estimators': [200, 700],
    'max_features': ['auto', 'sqrt', 'log2']
}

CV_rfc = GridSearchCV(estimator=rfc, param_grid=param_grid, cv= 5)
CV_rfc.fit(X, y)
print CV_rfc.best_params_

回答by rohithnama

Just to add one more point to keep it clear.

只是再补充一点以保持清楚。

The document says the following:

该文件说如下:

best_estimator_ : estimator or dict:

Estimator that was chosen by the search, i.e. estimator which gave highest score (or smallest loss if specified) on the left out data.

best_estimator_ :估算器或字典:

通过搜索选择的估计量,即对遗漏数据给出最高分(或最小损失,如果指定)的估计量。

When the grid search is called with various params, it chooses the one with the highest score based on the given scorer func. Best estimator gives the info of the params that resulted in the highest score.

当使用各种参数调用网格搜索时,它会根据给定的评分器函数选择得分最高的那个。Best estimator 给出了导致最高分的参数的信息。

Therefore, this can only be called after fitting the data.

因此,这只能在拟合数据后调用。