pandas 如何仅将参数传递给 scikit learn 中管道对象的一部分？

Question

提问by Sother

I need to pass a parameter, sample_weight, to my RandomForestClassifierlike so:

我需要将参数 , 传递sample_weight给我，RandomForestClassifier如下所示：

X = np.array([[2.0, 2.0, 1.0, 0.0, 1.0, 3.0, 3.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0,
        1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 5.0, 3.0,
        2.0, '0'],
       [15.0, 2.0, 5.0, 5.0, 0.466666666667, 4.0, 3.0, 2.0, 0.0, 0.0, 0.0,
        0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0,
        7.0, 14.0, 2.0, '0'],
       [3.0, 4.0, 3.0, 1.0, 1.33333333333, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0,
        0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0,
        9.0, 8.0, 2.0, '0'],
       [3.0, 2.0, 3.0, 0.0, 0.666666666667, 2.0, 2.0, 1.0, 0.0, 0.0, 0.0,
        0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0,
        5.0, 3.0, 1.0, '0']], dtype=object)

y = np.array([ 0.,  0.,  1.,  0.])

m = sklearn.ensemble.RandomForestClassifier(
        random_state=0, 
        oob_score=True, 
        n_estimators=100,
        min_samples_leaf=5, 
        max_depth=10)

m.fit(X, y, sample_weight=np.array([3,4,2,3]))

The above code works perfectly fine. Then, I try to do this in a pipeline object like so, using pipeline object instead of only random forest:

上面的代码工作得很好。然后，我尝试在像这样的管道对象中执行此操作，使用管道对象而不是仅使用随机森林：

m = sklearn.pipeline.Pipeline([
    ('feature_selection', sklearn.feature_selection.SelectKBest(
        score_func=sklearn.feature_selection.f_regression,
        k=25)),
    ('model', sklearn.ensemble.RandomForestClassifier(
        random_state=0, 
        oob_score=True, 
        n_estimators=500,
        min_samples_leaf=5, 
        max_depth=10))])

m.fit(X, y, sample_weight=np.array([3,4,2,3]))

Now this breaks in the fitmethod with "ValueError: need more than 1 value to unpack".

现在这打破了fit带有“ ValueError: need more than 1 value to unpack”的方法。

ValueError                                Traceback (most recent call last)
<ipython-input-212-c4299f5b3008> in <module>()
     25         max_depth=10))])
     26 
---> 27 m.fit(X, y, sample_weights=np.array([3,4,2,3]))

/usr/local/lib/python2.7/dist-packages/sklearn/pipeline.pyc in fit(self, X, y, **fit_params)
    128         data, then fit the transformed data using the final estimator.
    129         """
--> 130         Xt, fit_params = self._pre_transform(X, y, **fit_params)
    131         self.steps[-1][-1].fit(Xt, y, **fit_params)
    132         return self

/usr/local/lib/python2.7/dist-packages/sklearn/pipeline.pyc in _pre_transform(self, X, y, **fit_params)
    113         fit_params_steps = dict((step, {}) for step, _ in self.steps)
    114         for pname, pval in six.iteritems(fit_params):
--> 115             step, param = pname.split('__', 1)
    116             fit_params_steps[step][param] = pval
    117         Xt = X

ValueError: need more than 1 value to unpack

I am using sklearnversion 0.14.
I think that the problem is that the F selectionstep in the pipeline does not take in an argument for sample_weights. how do I pass this parameter to only one step in the pipeline with I run "fit"? Thanks.

我正在使用sklearn版本0.14。
我认为问题在于F selection管道中的步骤没有接受 sample_weights 的参数。如何在运行“ fit”的情况下将此参数仅传递给管道中的一个步骤？谢谢。

Answer 1

回答by ali_m

From the documentation:

从文档：

The purpose of the pipeline is to assemble several steps that can be cross-validated together while setting different parameters. For this, it enables setting parameters of the various steps using their names and the parameter name separated by a ‘__', as in the example below.

管道的目的是组装几个步骤，这些步骤可以在设置不同参数的同时进行交叉验证。为此，它可以使用名称和由 '__' 分隔的参数名称来设置各个步骤的参数，如下例所示。

So you can simply insert model__in front of whatever fit parameter kwargs you want to pass to your 'model'step:

因此，您可以简单地model__在要传递给'model'步骤的任何适合参数 kwargs 前面插入：

m.fit(X, y, model__sample_weight=np.array([3,4,2,3]))

Answer 2

回答by rovyko

You can also use the method set_paramsand prepend the name of the step.

您还可以使用该方法set_params并在步骤名称之前加上。

m = sklearn.pipeline.Pipeline([
    ('feature_selection', sklearn.feature_selection.SelectKBest(
        score_func=sklearn.feature_selection.f_regression,
        k=25)),
    ('model', sklearn.ensemble.RandomForestClassifier(
        random_state=0, 
        oob_score=True, 
        n_estimators=500,
        min_samples_leaf=5, 
        max_depth=10))])

m.set_params(model__sample_weight=np.array([3,4,2,3]))

Answer 3

回答by Anshul

Wish I could leave a comment on @rovyko post above instead of a separate answer but I don't have enough stackoverflow reputation yet to leave comments so here it is instead.

希望我可以在上面的@rovyko 帖子上发表评论而不是单独的答案，但我还没有足够的 stackoverflow 声誉来发表评论，所以这里是。

You cannot use:

您不能使用：

Pipeline.set_params(model__sample_weight=np.array([3,4,2,3])

to set parameters for the RandomForestClassifier.fit()method. Pipeline.set_params()as indicated in the code (here) is only for initialization parameters for individual steps in the Pipeline. RandomForestClassifierhas no initialization parameter called sample_weight(see its __init__()method here). sample_weightis actually an input parameter to RandomForestClassifier's fit()method and can therefore only be set by the method presented in the correctly marked answer be @ali_m, which is,

为RandomForestClassifier.fit()方法设置参数。Pipeline.set_params()如代码所示（此处）仅用于流水线中各个步骤的初始化参数。RandomForestClassifier没有调用初始化参数sample_weight（请参阅此处的__init__()方法）。实际上是的方法的输入参数，因此只能由正确标记的答案中提供的方法设置，即@ali_m，即，sample_weightRandomForestClassifierfit()

m.fit(X, y, model__sample_weight=np.array([3,4,2,3])).

pandas 如何仅将参数传递给 scikit learn 中管道对象的一部分？

提问by Sother

回答by ali_m

回答by rovyko

回答by Anshul

相关推荐

最近更新

标签

pandas 如何仅将参数传递给 scikit learn 中管道对象的一部分？

提问by Sother

回答by ali_m

回答by rovyko

回答by Anshul

相关推荐

pandas 如何从数据框中删除重复项？

如何使用 Pandas 在单元格中保存 *.xlsx 长 URL

pandas 将每一行与数据框中的所有行进行比较，并将结果保存在每行的列表中

Pandas：将带有空字符串的列转换为浮动

相关推荐

最近更新

标签