Python 单例数组数组(<function train at 0x7f3a311320d0>, dtype=object)不能被视为有效集合
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/43222882/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Singleton array array(<function train at 0x7f3a311320d0>, dtype=object) cannot be considered a valid collection
提问by manisha
Not sure how to fix . Any help much appreciate. I saw thi Vectorization: Not a valid collectionbut not sure if i understood this
不知道如何修复。任何帮助非常感谢。我看到了这个Vectorization: Not a valid collection但不确定我是否理解这一点
train = df1.iloc[:,[4,6]]
target =df1.iloc[:,[0]]
def train(classifier, X, y):
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=33)
classifier.fit(X_train, y_train)
print ("Accuracy: %s" % classifier.score(X_test, y_test))
return classifier
trial1 = Pipeline([
('vectorizer', TfidfVectorizer()),
('classifier', MultinomialNB()),])
train(trial1, train, target)
error below :
错误如下:
----> 6 train(trial1, train, target)
<ipython-input-140-ac0e8d32795e> in train(classifier, X, y)
1 def train(classifier, X, y):
----> 2 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=33)
3
4 classifier.fit(X_train, y_train)
5 print ("Accuracy: %s" % classifier.score(X_test, y_test))
/home/manisha/anaconda3/lib/python3.5/site-packages/sklearn/model_selection/_split.py in train_test_split(*arrays, **options)
1687 test_size = 0.25
1688
-> 1689 arrays = indexable(*arrays)
1690
1691 if stratify is not None:
/home/manisha/anaconda3/lib/python3.5/site-packages/sklearn/utils/validation.py in indexable(*iterables)
204 else:
205 result.append(np.array(X))
--> 206 check_consistent_length(*result)
207 return result
208
/home/manisha/anaconda3/lib/python3.5/site-packages/sklearn/utils/validation.py in check_consistent_length(*arrays)
175 """
176
--> 177 lengths = [_num_samples(X) for X in arrays if X is not None]
178 uniques = np.unique(lengths)
179 if len(uniques) > 1:
/home/manisha/anaconda3/lib/python3.5/site-packages/sklearn/utils/validation.py in <listcomp>(.0)
175 """
176
--> 177 lengths = [_num_samples(X) for X in arrays if X is not None]
178 uniques = np.unique(lengths)
179 if len(uniques) > 1:
/home/manisha/anaconda3/lib/python3.5/site-packages/sklearn/utils/validation.py in _num_samples(x)
124 if len(x.shape) == 0:
125 raise TypeError("Singleton array %r cannot be considered"
--> 126 " a valid collection." % x)
127 return x.shape[0]
128 else:
TypeError: Singleton array array(<function train at 0x7f3a311320d0>, dtype=object) cannot be considered a valid collection.
____
Not sure how to fix . Any help much appreciate. I saw thi Vectorization: Not a valid collectionbut not sure if i understood this
不知道如何修复。任何帮助非常感谢。我看到了这个Vectorization: Not a valid collection但不确定我是否理解这一点
采纳答案by Vivek Kumar
This error arises because your function train
masks your variable train
, and hence it is passed to itself.
出现此错误是因为您的函数train
屏蔽了您的变量train
,因此它被传递给自身。
Explanation:
说明:
You define a variable train like this:
您可以像这样定义一个变量火车:
train = df1.iloc[:,[4,6]]
Then after some lines, you define a method train like this:
然后在几行之后,你定义了一个像这样的方法训练:
def train(classifier, X, y):
So what actually happens is, your previous version of train
is updated with new version. That means that the train
now does not point to the Dataframe object as you wanted, but points to the function you defined. In the error it is cleared.
所以实际发生的情况是,您以前的版本train
已更新为新版本。这意味着train
now 并没有如您所愿地指向 Dataframe 对象,而是指向您定义的函数。在错误中它被清除。
array(<function train at 0x7f3a311320d0>, dtype=object)
See the function traininside the error statement.
请参阅错误语句中的函数 train。
Solution:
解决方案:
Rename one of them (the variable or the method).
Suggestion: Rename the function to some other name like training
or training_func
or something like that.
重命名其中之一(变量或方法)。
建议:重命名功能一些其他名称,如training
或training_func
或类似的东西。
回答by dopexxx
I got the same error in another context (sklearn train_test_split
) and the reason was simply that I had passed a positional argument as keyword argument which led to misinterpretation in the called function.
我在另一个上下文 ( sklearn train_test_split
) 中遇到了同样的错误,原因很简单,因为我传递了一个位置参数作为关键字参数,这导致被调用函数中的误解。
回答by Info5ek
A variation on the first answer - another reason you could get this is if a column name in your data is the same as an attribute/method of the object containing the data.
第一个答案的变体 - 另一个原因是,如果数据中的列名与包含数据的对象的属性/方法相同。
In my case, I was trying to access the column "count" in the dataframe "df" with the ostensibly legal syntax df.count.
就我而言,我试图使用表面上合法的语法 df.count 访问数据框“df”中的列“count”。
However count is considered an attribute of pandas dataframe objects. The resulting name collision creates the (rather befuddling) error.
然而,计数被认为是熊猫数据框对象的一个属性。由此产生的名称冲突会产生(相当令人困惑的)错误。
回答by Orhan Celik
I got the same error in sklearn.model_selection train_test_split
but in my case the reason was that I was providing an array derived from spark data frame to the function, not an array from a Pandas data frame. When I converted my data from to pandas data frame using toPandas() function such as below, and then providing Pandas df to the train_test_split , the issue was fixed.
我遇到了同样的错误,sklearn.model_selection train_test_split
但在我的情况下,原因是我向函数提供了从 spark 数据帧派生的数组,而不是 Pandas 数据帧的数组。当我使用如下所示的 toPandas() 函数将我的数据从 Pandas 数据帧转换,然后将 Pandas df 提供给 train_test_split 时,问题得到了解决。
pandas_df=spark_df.toPandas()
error:
错误:
features_to_use = ['Feature1', 'Feature2']
x5D = np.array(spark_df[ features_to_use ])
y5D = np.array(spark_df['TargetFeature'])
X_train, X_test, y_train, y_test = train_test_split(x5D, y5D, train_size=0.8)
fixed:
固定的:
pandas_df=spark_df.toPandas()
features_to_use = ['Feature1', 'Feature2']
x5D = np.array(pandas_df[ features_to_use ])
y5D = np.array(pandas_df['TargetFeature'])
X_train, X_test, y_train, y_test = train_test_split(x5D, y5D, train_size=0.8)