Python 找到具有 0 个样本的数组(形状 =(0, 40)),而最少需要 1 个

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/37632550/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 19:42:08  来源:igfitidea点击:

Found array with 0 sample(s) (shape=(0, 40)) while a minimum of 1 is required

pythonnumpyscikit-learn

提问by egorlitvinenko

I'm testing a simple prediction program with Python 2.7, sklearn 0.17.1, numpy 1.11.0. I got matrix with propabilities from LDA model, and now I want create RandomForestClassifier to predict results by propabilities. My code is:

我正在使用 Python 2.7、sklearn 0.17.1、numpy 1.11.0 测试一个简单的预测程序。我从 LDA 模型中得到了具有概率的矩阵,现在我想创建 RandomForestClassifier 来通过概率预测结果。我的代码是:

maxlen = 40
props = []
for doc in corpus:
    topics = model.get_document_topics(doc) 
    tprops = [0] * maxlen
    for topic in topics:
        tprops[topics[0]] = topics[1]
    props.append(tprops)

ntheta = np.array(props)
ny = np.array(y)

clf = RandomForestClassifier(n_estimators=100)
accuracy = cross_val_score(clf, ntheta, ny, scoring = 'accuracy')
print accuracy


ValueError                                Traceback (most recent call last)
<ipython-input-65-a7d276df43e9> in <module>()
      1 # clf.fit(nteta, ny)
      2 print nteta.shape, ny.shape
----> 3 accuracy = cross_val_score(clf, nteta, ny, scoring = 'accuracy')
      4 print accuracy

/home/egor/anaconda2/lib/python2.7/site-packages/sklearn/cross_validation.pyc in cross_val_score(estimator, X, y, scoring, cv, n_jobs, verbose, fit_params, pre_dispatch)
   1431                                               train, test, verbose, None,
   1432                                               fit_params)
-> 1433                       for train, test in cv)
   1434     return np.array(scores)[:, 0]
   1435 

/home/egor/anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.pyc in __call__(self, iterable)
    798             # was dispatched. In particular this covers the edge
    799             # case of Parallel used with an exhausted iterator.
--> 800             while self.dispatch_one_batch(iterator):
    801                 self._iterating = True
    802             else:

/home/egor/anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.pyc in dispatch_one_batch(self, iterator)
    656                 return False
    657             else:
--> 658                 self._dispatch(tasks)
    659                 return True
    660 

/home/egor/anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.pyc in _dispatch(self, batch)
    564 
    565         if self._pool is None:
--> 566             job = ImmediateComputeBatch(batch)
    567             self._jobs.append(job)
    568             self.n_dispatched_batches += 1

/home/egor/anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.pyc in __init__(self, batch)
    178         # Don't delay the application, to avoid keeping the input
    179         # arguments in memory
--> 180         self.results = batch()
    181 
    182     def get(self):

/home/egor/anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.pyc in __call__(self)
     70 
     71     def __call__(self):
---> 72         return [func(*args, **kwargs) for func, args, kwargs in self.items]
     73 
     74     def __len__(self):

/home/egor/anaconda2/lib/python2.7/site-packages/sklearn/cross_validation.pyc in _fit_and_score(estimator, X, y, scorer, train, test, verbose, parameters, fit_params, return_train_score, return_parameters, error_score)
   1529             estimator.fit(X_train, **fit_params)
   1530         else:
-> 1531             estimator.fit(X_train, y_train, **fit_params)
   1532 
   1533     except Exception as e:

/home/egor/anaconda2/lib/python2.7/site-packages/sklearn/ensemble/forest.pyc in fit(self, X, y, sample_weight)
    210         """
    211         # Validate or convert input data
--> 212         X = check_array(X, dtype=DTYPE, accept_sparse="csc")
    213         if issparse(X):
    214             # Pre-sort indices to avoid that each individual tree of the

/home/egor/anaconda2/lib/python2.7/site-packages/sklearn/utils/validation.pyc in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
    405                              " minimum of %d is required%s."
    406                              % (n_samples, shape_repr, ensure_min_samples,
--> 407                                 context))
    408 
    409     if ensure_min_features > 0 and array.ndim == 2:

ValueError: Found array with 0 sample(s) (shape=(0, 40)) while a minimum of 1 is required.


UPD For what I got 2 minus? Let critic be constructive.

UPD 因为我得到了 2 减?让批评家具有建设性。



UPD

UPD

cotiquefound that y was filled incorrect (must be other classes). And if y fills correct then the problem doesn't happens. In my case classes were wrong and their count were 39774. But in theory it's not an answer, why the error happens when we have 39774 classes and have to predict them.

cotique发现 y 填写不正确(必须是其他类)。如果 y 填写正确,则问题不会发生。在我的例子中,类是错误的,它们的数量是 39774。但从理论上讲,这不是一个答案,为什么当我们有 39774 个类并且必须预测它们时会发生错误。

回答by cotique

This is the original code from the scikit-learn repo (validation.py#L409):

这是来自 scikit-learn 存储库 ( validation.py#L409)的原始代码:

if ensure_min_samples > 0:
   n_samples = _num_samples(array)
   if n_samples < ensure_min_samples:
      raise ValueError("Found array with %d sample(s) (shape=%s) while a"
                       " minimum of %d is required%s."
                        % (n_samples, shape_repr, ensure_min_samples,
                        context))

So, the n_samples = _num_samples(array). By the way, arrayis the input object to check / convert.

所以,n_samples = _num_samples(array). 顺便说一下,arrayinput object to check / convert.

Next, validation.py#L111:

接下来,validation.py#L111

def _num_samples(x):
    """Return number of samples in array-like x."""
    if hasattr(x, 'fit'):
        # stuff
    if not hasattr(x, '__len__') and not hasattr(x, 'shape'):
        # stuff
    if hasattr(x, 'shape'):
        if len(x.shape) == 0:
            # raise TypeError
        return x.shape[0]
    else:
        return len(x)

So, the number of samples equals to the length of first dimension of array, which is 0since array.shape = (0, 40).

因此,样本数等于 的第一维的长度array,这是0因为array.shape = (0, 40)

And I don't know what this all means, but I hope it makes things clearer.

我不知道这一切意味着什么,但我希望它能让事情变得更清楚。