Python 找到具有 0 个样本的数组(形状 =(0, 40)),而最少需要 1 个
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/37632550/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Found array with 0 sample(s) (shape=(0, 40)) while a minimum of 1 is required
提问by egorlitvinenko
I'm testing a simple prediction program with Python 2.7, sklearn 0.17.1, numpy 1.11.0. I got matrix with propabilities from LDA model, and now I want create RandomForestClassifier to predict results by propabilities. My code is:
我正在使用 Python 2.7、sklearn 0.17.1、numpy 1.11.0 测试一个简单的预测程序。我从 LDA 模型中得到了具有概率的矩阵,现在我想创建 RandomForestClassifier 来通过概率预测结果。我的代码是:
maxlen = 40
props = []
for doc in corpus:
topics = model.get_document_topics(doc)
tprops = [0] * maxlen
for topic in topics:
tprops[topics[0]] = topics[1]
props.append(tprops)
ntheta = np.array(props)
ny = np.array(y)
clf = RandomForestClassifier(n_estimators=100)
accuracy = cross_val_score(clf, ntheta, ny, scoring = 'accuracy')
print accuracy
ValueError Traceback (most recent call last)
<ipython-input-65-a7d276df43e9> in <module>()
1 # clf.fit(nteta, ny)
2 print nteta.shape, ny.shape
----> 3 accuracy = cross_val_score(clf, nteta, ny, scoring = 'accuracy')
4 print accuracy
/home/egor/anaconda2/lib/python2.7/site-packages/sklearn/cross_validation.pyc in cross_val_score(estimator, X, y, scoring, cv, n_jobs, verbose, fit_params, pre_dispatch)
1431 train, test, verbose, None,
1432 fit_params)
-> 1433 for train, test in cv)
1434 return np.array(scores)[:, 0]
1435
/home/egor/anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.pyc in __call__(self, iterable)
798 # was dispatched. In particular this covers the edge
799 # case of Parallel used with an exhausted iterator.
--> 800 while self.dispatch_one_batch(iterator):
801 self._iterating = True
802 else:
/home/egor/anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.pyc in dispatch_one_batch(self, iterator)
656 return False
657 else:
--> 658 self._dispatch(tasks)
659 return True
660
/home/egor/anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.pyc in _dispatch(self, batch)
564
565 if self._pool is None:
--> 566 job = ImmediateComputeBatch(batch)
567 self._jobs.append(job)
568 self.n_dispatched_batches += 1
/home/egor/anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.pyc in __init__(self, batch)
178 # Don't delay the application, to avoid keeping the input
179 # arguments in memory
--> 180 self.results = batch()
181
182 def get(self):
/home/egor/anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.pyc in __call__(self)
70
71 def __call__(self):
---> 72 return [func(*args, **kwargs) for func, args, kwargs in self.items]
73
74 def __len__(self):
/home/egor/anaconda2/lib/python2.7/site-packages/sklearn/cross_validation.pyc in _fit_and_score(estimator, X, y, scorer, train, test, verbose, parameters, fit_params, return_train_score, return_parameters, error_score)
1529 estimator.fit(X_train, **fit_params)
1530 else:
-> 1531 estimator.fit(X_train, y_train, **fit_params)
1532
1533 except Exception as e:
/home/egor/anaconda2/lib/python2.7/site-packages/sklearn/ensemble/forest.pyc in fit(self, X, y, sample_weight)
210 """
211 # Validate or convert input data
--> 212 X = check_array(X, dtype=DTYPE, accept_sparse="csc")
213 if issparse(X):
214 # Pre-sort indices to avoid that each individual tree of the
/home/egor/anaconda2/lib/python2.7/site-packages/sklearn/utils/validation.pyc in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
405 " minimum of %d is required%s."
406 % (n_samples, shape_repr, ensure_min_samples,
--> 407 context))
408
409 if ensure_min_features > 0 and array.ndim == 2:
ValueError: Found array with 0 sample(s) (shape=(0, 40)) while a minimum of 1 is required.
UPD For what I got 2 minus? Let critic be constructive.
UPD 因为我得到了 2 减?让批评家具有建设性。
UPD
UPD
cotiquefound that y was filled incorrect (must be other classes). And if y fills correct then the problem doesn't happens. In my case classes were wrong and their count were 39774. But in theory it's not an answer, why the error happens when we have 39774 classes and have to predict them.
cotique发现 y 填写不正确(必须是其他类)。如果 y 填写正确,则问题不会发生。在我的例子中,类是错误的,它们的数量是 39774。但从理论上讲,这不是一个答案,为什么当我们有 39774 个类并且必须预测它们时会发生错误。
回答by cotique
This is the original code from the scikit-learn repo (validation.py#L409):
这是来自 scikit-learn 存储库 ( validation.py#L409)的原始代码:
if ensure_min_samples > 0:
n_samples = _num_samples(array)
if n_samples < ensure_min_samples:
raise ValueError("Found array with %d sample(s) (shape=%s) while a"
" minimum of %d is required%s."
% (n_samples, shape_repr, ensure_min_samples,
context))
So, the n_samples = _num_samples(array)
. By the way, array
is the input object to check / convert
.
所以,n_samples = _num_samples(array)
. 顺便说一下,array
是input object to check / convert
.
Next, validation.py#L111:
接下来,validation.py#L111:
def _num_samples(x):
"""Return number of samples in array-like x."""
if hasattr(x, 'fit'):
# stuff
if not hasattr(x, '__len__') and not hasattr(x, 'shape'):
# stuff
if hasattr(x, 'shape'):
if len(x.shape) == 0:
# raise TypeError
return x.shape[0]
else:
return len(x)
So, the number of samples equals to the length of first dimension of array
, which is 0
since array.shape = (0, 40)
.
因此,样本数等于 的第一维的长度array
,这是0
因为array.shape = (0, 40)
。
And I don't know what this all means, but I hope it makes things clearer.
我不知道这一切意味着什么,但我希望它能让事情变得更清楚。