pandas 如何仅将参数传递给 scikit learn 中管道对象的一部分?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/35632634/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to pass a parameter to only one part of a pipeline object in scikit learn?
提问by Sother
I need to pass a parameter, sample_weight
, to my RandomForestClassifier
like so:
我需要将参数 , 传递sample_weight
给我,RandomForestClassifier
如下所示:
X = np.array([[2.0, 2.0, 1.0, 0.0, 1.0, 3.0, 3.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0,
1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 5.0, 3.0,
2.0, '0'],
[15.0, 2.0, 5.0, 5.0, 0.466666666667, 4.0, 3.0, 2.0, 0.0, 0.0, 0.0,
0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0,
7.0, 14.0, 2.0, '0'],
[3.0, 4.0, 3.0, 1.0, 1.33333333333, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0,
0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0,
9.0, 8.0, 2.0, '0'],
[3.0, 2.0, 3.0, 0.0, 0.666666666667, 2.0, 2.0, 1.0, 0.0, 0.0, 0.0,
0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0,
5.0, 3.0, 1.0, '0']], dtype=object)
y = np.array([ 0., 0., 1., 0.])
m = sklearn.ensemble.RandomForestClassifier(
random_state=0,
oob_score=True,
n_estimators=100,
min_samples_leaf=5,
max_depth=10)
m.fit(X, y, sample_weight=np.array([3,4,2,3]))
The above code works perfectly fine. Then, I try to do this in a pipeline object like so, using pipeline object instead of only random forest:
上面的代码工作得很好。然后,我尝试在像这样的管道对象中执行此操作,使用管道对象而不是仅使用随机森林:
m = sklearn.pipeline.Pipeline([
('feature_selection', sklearn.feature_selection.SelectKBest(
score_func=sklearn.feature_selection.f_regression,
k=25)),
('model', sklearn.ensemble.RandomForestClassifier(
random_state=0,
oob_score=True,
n_estimators=500,
min_samples_leaf=5,
max_depth=10))])
m.fit(X, y, sample_weight=np.array([3,4,2,3]))
Now this breaks in the fit
method with "ValueError: need more than 1 value to unpack
".
现在这打破了fit
带有“ ValueError: need more than 1 value to unpack
”的方法。
ValueError Traceback (most recent call last)
<ipython-input-212-c4299f5b3008> in <module>()
25 max_depth=10))])
26
---> 27 m.fit(X, y, sample_weights=np.array([3,4,2,3]))
/usr/local/lib/python2.7/dist-packages/sklearn/pipeline.pyc in fit(self, X, y, **fit_params)
128 data, then fit the transformed data using the final estimator.
129 """
--> 130 Xt, fit_params = self._pre_transform(X, y, **fit_params)
131 self.steps[-1][-1].fit(Xt, y, **fit_params)
132 return self
/usr/local/lib/python2.7/dist-packages/sklearn/pipeline.pyc in _pre_transform(self, X, y, **fit_params)
113 fit_params_steps = dict((step, {}) for step, _ in self.steps)
114 for pname, pval in six.iteritems(fit_params):
--> 115 step, param = pname.split('__', 1)
116 fit_params_steps[step][param] = pval
117 Xt = X
ValueError: need more than 1 value to unpack
I am using sklearn
version 0.14
.
I think that the problem is that the F selection
step in the pipeline does not take in an argument for sample_weights. how do I pass this parameter to only one step in the pipeline with I run "fit
"? Thanks.
我正在使用sklearn
版本0.14
。
我认为问题在于F selection
管道中的步骤没有接受 sample_weights 的参数。如何在运行“ fit
”的情况下将此参数仅传递给管道中的一个步骤?谢谢。
回答by ali_m
The purpose of the pipeline is to assemble several steps that can be cross-validated together while setting different parameters. For this, it enables setting parameters of the various steps using their names and the parameter name separated by a ‘__', as in the example below.
管道的目的是组装几个步骤,这些步骤可以在设置不同参数的同时进行交叉验证。为此,它可以使用名称和由 '__' 分隔的参数名称来设置各个步骤的参数,如下例所示。
So you can simply insert model__
in front of whatever fit parameter kwargs you want to pass to your 'model'
step:
因此,您可以简单地model__
在要传递给'model'
步骤的任何适合参数 kwargs 前面插入:
m.fit(X, y, model__sample_weight=np.array([3,4,2,3]))
回答by rovyko
You can also use the method set_params
and prepend the name of the step.
您还可以使用该方法set_params
并在步骤名称之前加上。
m = sklearn.pipeline.Pipeline([
('feature_selection', sklearn.feature_selection.SelectKBest(
score_func=sklearn.feature_selection.f_regression,
k=25)),
('model', sklearn.ensemble.RandomForestClassifier(
random_state=0,
oob_score=True,
n_estimators=500,
min_samples_leaf=5,
max_depth=10))])
m.set_params(model__sample_weight=np.array([3,4,2,3]))
回答by Anshul
Wish I could leave a comment on @rovyko post above instead of a separate answer but I don't have enough stackoverflow reputation yet to leave comments so here it is instead.
希望我可以在上面的@rovyko 帖子上发表评论而不是单独的答案,但我还没有足够的 stackoverflow 声誉来发表评论,所以这里是。
You cannot use:
您不能使用:
Pipeline.set_params(model__sample_weight=np.array([3,4,2,3])
Pipeline.set_params(model__sample_weight=np.array([3,4,2,3])
to set parameters for the RandomForestClassifier.fit()
method. Pipeline.set_params()
as indicated in the code (here) is only for initialization parameters for individual steps in the Pipeline. RandomForestClassifier
has no initialization parameter called sample_weight
(see its __init__()
method here). sample_weight
is actually an input parameter to RandomForestClassifier
's fit()
method and can therefore only be set by the method presented in the correctly marked answer be @ali_m, which is,
为RandomForestClassifier.fit()
方法设置参数。Pipeline.set_params()
如代码所示(此处)仅用于流水线中各个步骤的初始化参数。RandomForestClassifier
没有调用初始化参数sample_weight
(请参阅此处的__init__()
方法)。实际上是的方法的输入参数,因此只能由正确标记的答案中提供的方法设置,即@ali_m,即,sample_weight
RandomForestClassifier
fit()
m.fit(X, y, model__sample_weight=np.array([3,4,2,3]))
.
m.fit(X, y, model__sample_weight=np.array([3,4,2,3]))
.