Python 单例数组数组(<function train at 0x7f3a311320d0>, dtype=object)不能被视为有效集合

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/43222882/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 22:40:25  来源:igfitidea点击:

Singleton array array(<function train at 0x7f3a311320d0>, dtype=object) cannot be considered a valid collection

pythonpandasscikit-learnpipelinetrain-test-split

提问by manisha

Not sure how to fix . Any help much appreciate. I saw thi Vectorization: Not a valid collectionbut not sure if i understood this

不知道如何修复。任何帮助非常感谢。我看到了这个Vectorization: Not a valid collection但不确定我是否理解这一点

train = df1.iloc[:,[4,6]]
target =df1.iloc[:,[0]]

def train(classifier, X, y):
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=33)
    classifier.fit(X_train, y_train)
    print ("Accuracy: %s" % classifier.score(X_test, y_test))
    return classifier

trial1 = Pipeline([
         ('vectorizer', TfidfVectorizer()),
         ('classifier', MultinomialNB()),])

train(trial1, train, target)

error below :

错误如下:

    ----> 6 train(trial1, train, target)

    <ipython-input-140-ac0e8d32795e> in train(classifier, X, y)
          1 def train(classifier, X, y):
    ----> 2     X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=33)
          3 
          4     classifier.fit(X_train, y_train)
          5     print ("Accuracy: %s" % classifier.score(X_test, y_test))

    /home/manisha/anaconda3/lib/python3.5/site-packages/sklearn/model_selection/_split.py in train_test_split(*arrays, **options)
       1687         test_size = 0.25
       1688 
    -> 1689     arrays = indexable(*arrays)
       1690 
       1691     if stratify is not None:

    /home/manisha/anaconda3/lib/python3.5/site-packages/sklearn/utils/validation.py in indexable(*iterables)
        204         else:
        205             result.append(np.array(X))
    --> 206     check_consistent_length(*result)
        207     return result
        208 

    /home/manisha/anaconda3/lib/python3.5/site-packages/sklearn/utils/validation.py in check_consistent_length(*arrays)
        175     """
        176 
    --> 177     lengths = [_num_samples(X) for X in arrays if X is not None]
        178     uniques = np.unique(lengths)
        179     if len(uniques) > 1:

    /home/manisha/anaconda3/lib/python3.5/site-packages/sklearn/utils/validation.py in <listcomp>(.0)
        175     """
        176 
    --> 177     lengths = [_num_samples(X) for X in arrays if X is not None]
        178     uniques = np.unique(lengths)
        179     if len(uniques) > 1:

    /home/manisha/anaconda3/lib/python3.5/site-packages/sklearn/utils/validation.py in _num_samples(x)
        124         if len(x.shape) == 0:
        125             raise TypeError("Singleton array %r cannot be considered"
    --> 126                             " a valid collection." % x)
        127         return x.shape[0]
        128     else:

    TypeError: Singleton array array(<function train at 0x7f3a311320d0>, dtype=object) cannot be considered a valid collection.

 ____

Not sure how to fix . Any help much appreciate. I saw thi Vectorization: Not a valid collectionbut not sure if i understood this

不知道如何修复。任何帮助非常感谢。我看到了这个Vectorization: Not a valid collection但不确定我是否理解这一点

采纳答案by Vivek Kumar

This error arises because your function trainmasks your variable train, and hence it is passed to itself.

出现此错误是因为您的函数train屏蔽了您的变量train,因此它被传递给自身。

Explanation:

说明

You define a variable train like this:

您可以像这样定义一个变量火车:

train = df1.iloc[:,[4,6]]

Then after some lines, you define a method train like this:

然后在几行之后,你定义了一个像这样的方法训练:

def train(classifier, X, y):

So what actually happens is, your previous version of trainis updated with new version. That means that the trainnow does not point to the Dataframe object as you wanted, but points to the function you defined. In the error it is cleared.

所以实际发生的情况是,您以前的版本train已更新为新版本。这意味着trainnow 并没有如您所愿地指向 Dataframe 对象,而是指向您定义的函数。在错误中它被清除。

array(<function train at 0x7f3a311320d0>, dtype=object)

See the function traininside the error statement.

请参阅错误语句中的函数 train

Solution:

解决方案

Rename one of them (the variable or the method). Suggestion: Rename the function to some other name like trainingor training_funcor something like that.

重命名其中之一(变量或方法)。 建议:重命名功能一些其他名称,如trainingtraining_func或类似的东西。

回答by dopexxx

I got the same error in another context (sklearn train_test_split) and the reason was simply that I had passed a positional argument as keyword argument which led to misinterpretation in the called function.

我在另一个上下文 ( sklearn train_test_split) 中遇到了同样的错误,原因很简单,因为我传递了一个位置参数作为关键字参数,这导致被调用函数中的误解。

回答by Info5ek

A variation on the first answer - another reason you could get this is if a column name in your data is the same as an attribute/method of the object containing the data.

第一个答案的变体 - 另一个原因是,如果数据中的列名与包含数据的对象的属性/方法相同。

In my case, I was trying to access the column "count" in the dataframe "df" with the ostensibly legal syntax df.count.

就我而言,我试图使用表面上合法的语法 df.count 访问数据框“df”中的列“count”。

However count is considered an attribute of pandas dataframe objects. The resulting name collision creates the (rather befuddling) error.

然而,计数被认为是熊猫数据框对象的一个​​属性。由此产生的名称冲突会产生(相当令人困惑的)错误。

回答by Orhan Celik

I got the same error in sklearn.model_selection train_test_splitbut in my case the reason was that I was providing an array derived from spark data frame to the function, not an array from a Pandas data frame. When I converted my data from to pandas data frame using toPandas() function such as below, and then providing Pandas df to the train_test_split , the issue was fixed.

我遇到了同样的错误,sklearn.model_selection train_test_split但在我的情况下,原因是我向函数提供了从 spark 数据帧派生的数组,而不是 Pandas 数据帧的数组。当我使用如下所示的 toPandas() 函数将我的数据从 Pandas 数据帧转换,然后将 Pandas df 提供给 train_test_split 时,问题得到了解决。

pandas_df=spark_df.toPandas()

error:

错误:

features_to_use = ['Feature1', 'Feature2']
x5D = np.array(spark_df[ features_to_use ])
y5D = np.array(spark_df['TargetFeature'])
X_train, X_test, y_train, y_test = train_test_split(x5D, y5D, train_size=0.8)

fixed:

固定的:

pandas_df=spark_df.toPandas()
features_to_use = ['Feature1', 'Feature2']
x5D = np.array(pandas_df[ features_to_use ])
y5D = np.array(pandas_df['TargetFeature'])
X_train, X_test, y_train, y_test = train_test_split(x5D, y5D, train_size=0.8)