Python 如何在 GridSearchCV（随机森林分类器 Scikit）上获得最佳估计器

Question

提问by sapo_cosmico

I'm running GridSearch CV to optimize the parameters of a classifier in scikit. Once I'm done, I'd like to know which parameters were chosen as the best.

我正在运行 GridSearch CV 来优化 scikit 中分类器的参数。完成后，我想知道哪些参数被选为最佳参数。

Whenever I do so I get a AttributeError: 'RandomForestClassifier' object has no attribute 'best_estimator_', and can't tell why, as it seems to be a legitimate attribute on the documentation.

每当我这样做时，我都会得到一个AttributeError: 'RandomForestClassifier' object has no attribute 'best_estimator_'，并且不知道为什么，因为它似乎是文档中的合法属性。

from sklearn.grid_search import GridSearchCV

X = data[usable_columns]
y = data[target]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

rfc = RandomForestClassifier(n_jobs=-1,max_features= 'sqrt' ,n_estimators=50, oob_score = True) 

param_grid = {
    'n_estimators': [200, 700],
    'max_features': ['auto', 'sqrt', 'log2']
}

CV_rfc = GridSearchCV(estimator=rfc, param_grid=param_grid, cv= 5)

print '\n',CV_rfc.best_estimator_

Yields:

产量：

`AttributeError: 'GridSearchCV' object has no attribute 'best_estimator_'

Answer 1

采纳答案by Ryan

You have to fit your data before you can get the best parameter combination.

您必须先拟合数据，然后才能获得最佳参数组合。

from sklearn.grid_search import GridSearchCV
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
# Build a classification task using 3 informative features
X, y = make_classification(n_samples=1000,
                           n_features=10,
                           n_informative=3,
                           n_redundant=0,
                           n_repeated=0,
                           n_classes=2,
                           random_state=0,
                           shuffle=False)


rfc = RandomForestClassifier(n_jobs=-1,max_features= 'sqrt' ,n_estimators=50, oob_score = True) 

param_grid = { 
    'n_estimators': [200, 700],
    'max_features': ['auto', 'sqrt', 'log2']
}

CV_rfc = GridSearchCV(estimator=rfc, param_grid=param_grid, cv= 5)
CV_rfc.fit(X, y)
print CV_rfc.best_params_

Answer 2

回答by rohithnama

Just to add one more point to keep it clear.

只是再补充一点以保持清楚。

The document says the following:

该文件说如下：

best_estimator_ : estimator or dict:
Estimator that was chosen by the search, i.e. estimator which gave highest score (or smallest loss if specified) on the left out data.

best_estimator_ ：估算器或字典：
通过搜索选择的估计量，即对遗漏数据给出最高分（或最小损失，如果指定）的估计量。

When the grid search is called with various params, it chooses the one with the highest score based on the given scorer func. Best estimator gives the info of the params that resulted in the highest score.

当使用各种参数调用网格搜索时，它会根据给定的评分器函数选择得分最高的那个。Best estimator 给出了导致最高分的参数的信息。

Therefore, this can only be called after fitting the data.

因此，这只能在拟合数据后调用。

Python 如何在 GridSearchCV（随机森林分类器 Scikit）上获得最佳估计器

提问by sapo_cosmico

采纳答案by Ryan

回答by rohithnama

相关推荐

最近更新

标签

Python 如何在 GridSearchCV（随机森林分类器 Scikit）上获得最佳估计器

提问by sapo_cosmico

采纳答案by Ryan

回答by rohithnama

相关推荐

Python 一个块中的多个尝试代码

使用python将数据从csv复制到postgresql

Python 类型错误：'dict_keys' 对象不支持索引

如何使用 Selenium WebDriver for python 在浏览器上打开一个新窗口？

相关推荐

最近更新

标签