Python 类型错误:get_params() 缺少 1 个必需的位置参数:'self'

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/30026960/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 07:51:42  来源:igfitidea点击:

TypeError: get_params() missing 1 required positional argument: 'self'

pythonscikit-learn

提问by Xiangru Lian

I was trying to use scikit-learnpackage with python-3.4 to do a grid search,

我试图使用scikit-learn带有 python-3.4 的包来进行网格搜索,

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model.logistic import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.grid_search import GridSearchCV
import pandas as pd
from sklearn.cross_validation import train_test_split
from sklearn.metrics import precision_score, recall_score, accuracy_score
from sklearn.preprocessing import LabelBinarizer
import numpy as np

pipeline = Pipeline([
    ('vect', TfidfVectorizer(stop_words='english')),
    ('clf', LogisticRegression)
])

parameters = {
    'vect__max_df': (0.25, 0.5, 0.75),
    'vect__stop_words': ('english', None),
    'vect__max_features': (2500, 5000, 10000, None),
    'vect__ngram_range': ((1, 1), (1, 2)),
    'vect__use_idf': (True, False),
    'vect__norm': ('l1', 'l2'),
    'clf__penalty': ('l1', 'l2'),
    'clf__C': (0.01, 0.1, 1, 10)
}

if __name__ == '__main__':
    grid_search = GridSearchCV(pipeline, parameters, n_jobs=-1, verbose=1, scoring='accuracy', cv = 3)
    df = pd.read_csv('SMS Spam Collection/SMSSpamCollection', delimiter='\t', header=None)
    lb = LabelBinarizer()
    X, y = df[1], np.array([number[0] for number in lb.fit_transform(df[0])])
    X_train, X_test, y_train, y_test = train_test_split(X, y)
    grid_search.fit(X_train, y_train)
    print('Best score: ', grid_search.best_score_)
    print('Best parameter set:')
    best_parameters = grid_search.best_estimator_.get_params()
    for param_name in sorted(best_parameters):
        print(param_name, best_parameters[param_name])

However, it does not run successfully, the error message looks like this:

但是,它没有成功运行,错误消息如下所示:

Fitting 3 folds for each of 1536 candidates, totalling 4608 fits
Traceback (most recent call last):
  File "/home/xiangru/PycharmProjects/machine_learning_note_with_sklearn/grid search.py", line 36, in <module>
    grid_search.fit(X_train, y_train)
  File "/usr/local/lib/python3.4/dist-packages/sklearn/grid_search.py", line 732, in fit
    return self._fit(X, y, ParameterGrid(self.param_grid))
  File "/usr/local/lib/python3.4/dist-packages/sklearn/grid_search.py", line 493, in _fit
    base_estimator = clone(self.estimator)
  File "/usr/local/lib/python3.4/dist-packages/sklearn/base.py", line 47, in clone
    new_object_params[name] = clone(param, safe=False)
  File "/usr/local/lib/python3.4/dist-packages/sklearn/base.py", line 35, in clone
    return estimator_type([clone(e, safe=safe) for e in estimator])
  File "/usr/local/lib/python3.4/dist-packages/sklearn/base.py", line 35, in <listcomp>
    return estimator_type([clone(e, safe=safe) for e in estimator])
  File "/usr/local/lib/python3.4/dist-packages/sklearn/base.py", line 35, in clone
    return estimator_type([clone(e, safe=safe) for e in estimator])
  File "/usr/local/lib/python3.4/dist-packages/sklearn/base.py", line 35, in <listcomp>
    return estimator_type([clone(e, safe=safe) for e in estimator])
  File "/usr/local/lib/python3.4/dist-packages/sklearn/base.py", line 45, in clone
    new_object_params = estimator.get_params(deep=False)
TypeError: get_params() missing 1 required positional argument: 'self'

I also tried to use only

我也尝试只使用

if __name__ == '__main__':
    pipeline.get_params()

It gives the same error message. Who knows how to fix this?

它给出了相同的错误消息。谁知道如何解决这个问题?

采纳答案by Xiangru Lian

I finally get the problem solved. The reason is exactly as what abarnert said.

我终于把问题解决了。原因正如阿巴纳特所说。

Firstly I tried:

首先我试过:

pipeline = LogisticRegression()

parameters = {
    'penalty': ('l1', 'l2'),
    'C': (0.01, 0.1, 1, 10)
}

and it works well.

它运作良好。

With that intuition I modified the pipeline to be:

凭着这种直觉,我将管道修改为:

pipeline = Pipeline([
    ('vect', TfidfVectorizer(stop_words='english')),
    ('clf', LogisticRegression())
])

Note that there is a ()after LogisticRegression. This time it works.

请注意,有一个()after LogisticRegression。这一次它起作用了。

回答by abarnert

This error is almost always misleading, and actuallymeans that you're calling an instance method on the class, rather than the instance (like calling dict.keys()instead of d.keys()on a dictnamed d).*

这个错误几乎总是具有误导性,实际上意味着您在类上调用实例方法,而不是实例(例如调用dict.keys()而不是d.keys()dictnamed 上d)。*

And that's exactly what's going on here. The docsimply that the best_estimator_attribute, like the estimatorparameter to the initializer, is not an estimator instance, it's an estimator type, and "A object of that type is instantiated for each grid point."

这正是这里发生的事情。文档暗示best_estimator_属性,就像estimator初始化器的参数一样,不是 estimator instance,它是 estimator type,并且“为每个网格点实例化该类型的对象。”

So, if you want to call methods, you have to construct an object of that type, for some particular grid point.

所以,如果你想调用方法,你必须为某个特定的网格点构造一个该类型的对象。

However, from a quick glance at the docs, if you're trying to get the params that were used for the particular instance of the best estimator that returned the best score, isn't that just going to be best_params_? (I apologize that this part is a bit of a guess…)

但是,快速浏览一下文档,如果您试图获取用于返回最佳分数的最佳估计器的特定实例的参数,那不就是这样best_params_吗?(我很抱歉这部分有点猜测......)



For the Pipelinecall, you definitely have an instance there. And the only documentationfor that method is a param spec which shows that it takes one optional argument, deep. But under the covers, it's probably forwarding the get_params()call to one of its attributes. And with ('clf', LogisticRegression), it looks like you're constructing it with the classLogisticRegression, rather than an instance of that class, so if that's what it ends up forwarding to, that would explain the problem.

对于Pipeline呼叫,您肯定在那里有一个实例。该方法的唯一文档是 param 规范,它表明它采用一个可选参数deep. 但在幕后,它可能会将get_params()调用转发到其属性之一。和('clf', LogisticRegression),看起来你是用class构造它LogisticRegression,而不是那个类的实例,所以如果这就是它最终转发到的,那就可以解释问题了。



* The reason the error says "missing 1 required positional argument: 'self'" instead of "must be called on an instance" or something is that in Python, d.keys()is effectively turned into dict.keys(d), and it's perfectly legal (and sometimes useful) to call it that way explicitly, so Python can't really tell you that dict.keys()is illegal, just that it's missing the selfargument.

* 错误的原因是“缺少 1 个必需的位置参数:‘self’”而不是“必须在实例上调用”或其他原因,在 Python 中,d.keys()实际上变成了dict.keys(d),并且调用它是完全合法的(有时是有用的)明确地那样做,所以 Python 不能真正告诉你这dict.keys()是非法的,只是它缺少self参数。