Python 将 GridSearchCV 与 AdaBoost 和 DecisionTreeClassifier 结合使用

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/32210569/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 11:13:14  来源:igfitidea点击:

Using GridSearchCV with AdaBoost and DecisionTreeClassifier

pythonscikit-learndecision-treeadaboostgrid-search

提问by GPB

I am attempting to tune an AdaBoost Classifier ("ABT") using a DecisionTreeClassifier ("DTC") as the base_estimator. I would like to tune bothABT and DTC parameters simultaneously, but am not sure how to accomplish this - pipeline shouldn't work, as I am not "piping" the output of DTC to ABT. The idea would be to iterate hyper parameters for ABT and DTC in the GridSearchCV estimator.

我正在尝试使用 DecisionTreeClassifier(“DTC”)作为 base_estimator 来调整 AdaBoost 分类器(“ABT”)。我想调ABT和DTC参数同步,但我不知道如何做到这一点-管道不应该工作,因为我不是“管道” DTC的输出ABT。这个想法是在 GridSearchCV 估计器中迭代 ABT 和 DTC 的超参数。

How can I specify the tuning parameters correctly?

如何正确指定调谐参数?

I tried the following, which generated an error below.

我尝试了以下操作,但在下面生成了错误。

[IN]
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import AdaBoostClassifier
from sklearn.grid_search import GridSearchCV

param_grid = {dtc__criterion : ["gini", "entropy"],
              dtc__splitter :   ["best", "random"],
              abc__n_estimators: [none, 1, 2]
             }


DTC = DecisionTreeClassifier(random_state = 11, max_features = "auto", class_weight = "auto",max_depth = None)

ABC = AdaBoostClassifier(base_estimator = DTC)

# run grid search
grid_search_ABC = GridSearchCV(ABC, param_grid=param_grid, scoring = 'roc_auc')

[OUT]
ValueError: Invalid parameter dtc for estimator AdaBoostClassifier(algorithm='SAMME.R',
      base_estimator=DecisionTreeClassifier(class_weight='auto', criterion='gini', max_depth=None,
        max_features='auto', max_leaf_nodes=None, min_samples_leaf=1,
        min_samples_split=2, min_weight_fraction_leaf=0.0,
        random_state=11, splitter='best'),
      learning_rate=1.0, n_estimators=50, random_state=11)

采纳答案by ldirer

There are several things wrong in the code you posted:

您发布的代码中有几处错误:

  1. The keys of the param_griddictionary need to be strings. You should be getting a NameError.
  2. The key "abc__n_estimators" should just be "n_estimators": you are probably mixing this with the pipeline syntax. Here nothing tells Python that the string "abc" represents your AdaBoostClassifier.
  3. None(and not none) is not a valid value for n_estimators. The default value (probably what you meant) is 50.
  1. param_grid字典的键必须是字符串。你应该得到一个NameError.
  2. 关键“abc__n_estimators”应该只是“n_estimators”:您可能将其与管道语法混合使用。这里没有任何东西告诉 Python 字符串“abc”代表你的AdaBoostClassifier.
  3. None(而不是none) 不是 的有效值n_estimators。默认值(可能是您的意思)是 50。

Here's the code with these fixes. To set the parameters of your Tree estimator you can use the "__" syntax that allows accessing nested parameters.

这是带有这些修复程序的代码。要设置 Tree estimator 的参数,您可以使用允许访问嵌套参数的“__”语法。

from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import AdaBoostClassifier
from sklearn.grid_search import GridSearchCV

param_grid = {"base_estimator__criterion" : ["gini", "entropy"],
              "base_estimator__splitter" :   ["best", "random"],
              "n_estimators": [1, 2]
             }


DTC = DecisionTreeClassifier(random_state = 11, max_features = "auto", class_weight = "auto",max_depth = None)

ABC = AdaBoostClassifier(base_estimator = DTC)

# run grid search
grid_search_ABC = GridSearchCV(ABC, param_grid=param_grid, scoring = 'roc_auc')

Also, 1 or 2 estimators does not really make sense for AdaBoost. But I'm guessing this is not the actual code you're running.

此外,1 或 2 个估算器对 AdaBoost 没有真正意义。但我猜这不是您正在运行的实际代码。

Hope this helps.

希望这可以帮助。