pandas GridSearchCV:“类型错误:‘StratifiedKFold’对象不可迭代”

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/40257492/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:17:15  来源:igfitidea点击:

GridSearchCV: "TypeError: 'StratifiedKFold' object is not iterable"

pandasscikit-learngrid-searchsklearn-pandas

提问by user183897

I want to perform GridSearchCV in a RandomForestClassifier, but data is not balanced, so I use StratifiedKFold:

我想在 RandomForestClassifier 中执行 GridSearchCV,但数据不平衡,所以我使用 StratifiedKFold:

from sklearn.model_selection import StratifiedKFold
from sklearn.grid_search import GridSearchCV
from sklearn.ensemble import RandomForestClassifier

param_grid = {'n_estimators':[10, 30, 100, 300], "max_depth": [3, None],
          "max_features": [1, 5, 10], "min_samples_leaf": [1, 10, 25, 50], "criterion": ["gini", "entropy"]}

rfc = RandomForestClassifier()

clf = GridSearchCV(rfc, param_grid=param_grid, cv=StratifiedKFold()).fit(X_train, y_train)

But I get an error:

但我收到一个错误:

TypeError                                 Traceback (most recent call last)
<ipython-input-597-b08e92c33165> in <module>()
     9 rfc = RandomForestClassifier()
     10 
---> 11 clf = GridSearchCV(rfc, param_grid=param_grid, cv=StratifiedKFold()).fit(X_train, y_train)

c:\python34\lib\site-packages\sklearn\grid_search.py in fit(self, X, y)
    811 
    812         """
--> 813         return self._fit(X, y, ParameterGrid(self.param_grid))

c:\python34\lib\site-packages\sklearn\grid_search.py in _fit(self, X, y, parameter_iterable)
    559                                     self.fit_params, return_parameters=True,
    560                                     error_score=self.error_score)
--> 561                 for parameters in parameter_iterable
    562                 for train, test in cv)

c:\python34\lib\site-packages\sklearn\externals\joblib\parallel.py in __call__(self, iterable)
    756             # was dispatched. In particular this covers the edge
    757             # case of Parallel used with an exhausted iterator.
--> 758             while self.dispatch_one_batch(iterator):
    759                 self._iterating = True
    760             else:

c:\python34\lib\site-packages\sklearn\externals\joblib\parallel.py in dispatch_one_batch(self, iterator)
    601 
    602         with self._lock:
--> 603             tasks = BatchedCalls(itertools.islice(iterator, batch_size))
    604             if len(tasks) == 0:
    605                 # No more tasks available in the iterator: tell caller to stop.

c:\python34\lib\site-packages\sklearn\externals\joblib\parallel.py in __init__(self, iterator_slice)
    125 
    126     def __init__(self, iterator_slice):
--> 127         self.items = list(iterator_slice)
    128         self._size = len(self.items)

c:\python34\lib\site-packages\sklearn\grid_search.py in <genexpr>(.0)
    560                                     error_score=self.error_score)
    561                 for parameters in parameter_iterable
--> 562                 for train, test in cv)
    563 
    564         # Out is a list of triplet: score, estimator, n_test_samples

TypeError: 'StratifiedKFold' object is not iterable

When I write cv=StratifiedKFold(y_train)I have ValueError: The number of folds must be of Integral type.But when I write `cv=5, it works.

当我写的时候cv=StratifiedKFold(y_train)我有ValueError: The number of folds must be of Integral type.但是当我写 `cv=5 时,它起作用了。

I don't understand what is wrong with StratifiedKFold

我不明白 StratifiedKFold 有什么问题

回答by seralouk

I had exactly the same problem. The solution that worked for me is to replace:

我遇到了完全相同的问题。对我有用的解决方案是替换

from sklearn.grid_search import GridSearchCV

with

from sklearn.model_selection import GridSearchCV


Then it should work fine.

那么它应该可以正常工作。

回答by rll

The problem here is an API change as mentioned in other answers, however the answers could be more explicit.

这里的问题是其他答案中提到的 API 更改,但答案可能更明确。

The cvparameter documentation states:

cv参数文档状态:

cv : int, cross-validation generator or an iterable, optional

Determines the cross-validation splitting strategy. Possible inputs for cv are:

  • None, to use the default 3-fold cross-validation, integer, to specify the number of folds.

  • An object to be used as a cross-validation generator.

  • An iterable yielding train/test splits.

For integer/None inputs, if y is binary or multiclass, StratifiedKFold used. If the estimator is a classifier or if y is neither binary nor multiclass, KFold is used.

cv : int,交叉验证生成器或可迭代的,可选的

确定交叉验证拆分策略。cv 的可能输入是:

  • 无,使用默认的 3 折交叉验证,整数,指定折叠数。

  • 用作交叉验证生成器的对象。

  • 一个可迭代的产生训练/测试分割。

对于整数/无输入,如果 y 是二进制或多类,则使用 StratifiedKFold。如果估计器是分类器,或者 y 既不是二元类也不是​​多类,则使用 KFold。

So, whatever the cross validation strategyused, all that is needed is to provide the generator using the function split, as suggested:

因此,无论使用哪种交叉验证策略,所需要的只是使用函数提供生成器split,如建议的那样:

kfolds = StratifiedKFold(5)
clf = GridSearchCV(estimator, parameters, scoring=qwk, cv=kfolds.split(xtrain,ytrain))
clf.fit(xtrain, ytrain)

回答by ebrahimi

It seems that cv=StratifiedKFold()).fit(X_train, y_train)should be changed to cv=StratifiedKFold()).split(X_train, y_train).

好像cv=StratifiedKFold()).fit(X_train, y_train)应该改成cv=StratifiedKFold()).split(X_train, y_train).

回答by simon

The api changed in the latest version. You used to pass y and now you pass just the number when you create the stratifiedKFold object. You pass the y later.

api 在最新版本中发生了变化。您曾经传递 y ,现在您在创建分层 KFold 对象时只传递数字。你稍后通过 y。