pandas Scikit-learn cross val score:数组的索引太多

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/31995175/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:46:17  来源:igfitidea点击:

Scikit-learn cross val score: too many indices for array

pythonpandasscikit-learn

提问by dartdog

I have the following code

我有以下代码

 from sklearn.ensemble import ExtraTreesClassifier
 from sklearn.cross_validation import cross_val_score
 #split the dataset for train and test
 combnum['is_train'] = np.random.uniform(0, 1, len(combnum)) <= .75
 train, test = combnum[combnum['is_train']==True], combnum[combnum['is_train']==False]

 et = ExtraTreesClassifier(n_estimators=200, max_depth=None, min_samples_split=10, random_state=0)
 min_samples_split=10, random_state=0  )

 labels = train[list(label_columns)].values
 tlabels = test[list(label_columns)].values

 features = train[list(columns)].values
 tfeatures = test[list(columns)].values

 et_score = cross_val_score(et, features, labels, n_jobs=-1)
 print("{0} -> ET: {1})".format(label_columns, et_score))

Checking the shape of the arrays:

检查数组的形状:

 features.shape
 Out[19]:(43069, 34)

And

labels.shape
Out[20]:(43069, 1)

and I'm getting:

我得到:

IndexError: too many indices for array

and this relevant part of the traceback:

以及回溯的相关部分:

---> 22 et_score = cross_val_score(et, features, labels, n_jobs=-1)

I'm creating the data from Pandas dataframes and I searched here and saw some reference to possible errors via this method but can't figure out how to correct? What the data arrays look like: features

我正在从 Pandas 数据帧创建数据,我在这里搜索并看到了一些通过这种方法可能出现的错误的参考,但不知道如何纠正?数据数组的样子:特征

Out[21]:
array([[ 0.,  1.,  1., ...,  0.,  0.,  1.],
   [ 0.,  1.,  1., ...,  0.,  0.,  1.],
   [ 1.,  1.,  1., ...,  0.,  0.,  1.],
   ..., 
   [ 0.,  0.,  1., ...,  0.,  0.,  1.],
   [ 0.,  0.,  1., ...,  0.,  0.,  1.],
   [ 0.,  0.,  1., ...,  0.,  0.,  1.]])

labels

标签

Out[22]:
array([[1],
   [1],
   [1],
   ..., 
   [1],
   [1],
   [1]])

回答by YE LIANG HARRY

When we do cross validation in scikit-learn, the process requires an (R,)shape label instead of (R,1). Although they are the same thing to some extend, their indexing mechanisms are different. So in your case, just add:

当我们在 scikit-learn 中进行交叉验证时,该过程需要一个(R,)形状标签而不是(R,1)。尽管它们在某种程度上是相同的,但它们的索引机制是不同的。所以在你的情况下,只需添加:

c, r = labels.shape
labels = labels.reshape(c,)

before passing it to the cross-validation function.

在将其传递给交叉验证函数之前。

回答by Bud

It seems to be fixable if you specify the target labels as a single data column from Pandas. If the target has multiple columns, I get a similar error. For example try:

如果您将目标标签指定为 Pandas 的单个数据列,这似乎是可以修复的。如果目标有多个列,我会收到类似的错误。例如尝试:

labels = train['Y']

回答by MSalty

Adding .ravel()to the Y/Labels variable passed into the formula helped solve this problem within KNN as well.

添加.ravel()到传递给公式的 Y/Labels 变量也有助于解决 KNN 中的这个问题。

回答by Yang Zhao

try target:

尝试目标:

y=df['Survived'] 

instead , i used

相反,我用

y=df[['Survived']] 

which made the target y a dateframe, it seems series would be ok

这使目标成为日期框架,看来系列没问题

回答by Gursel Karacor

You might need to play with the dimensions a bit, e.g.

您可能需要稍微调整一下尺寸,例如

et_score = cross_val_score(et, features, labels, n_jobs=-1)[:,n]

or

或者

 et_score = cross_val_score(et, features, labels, n_jobs=-1)[n,:]

n being the dimension.

n 是维度。