pandas Scikit-learn cross val score：数组的索引太多

Question

提问by dartdog

I have the following code

我有以下代码

 from sklearn.ensemble import ExtraTreesClassifier
 from sklearn.cross_validation import cross_val_score
 #split the dataset for train and test
 combnum['is_train'] = np.random.uniform(0, 1, len(combnum)) <= .75
 train, test = combnum[combnum['is_train']==True], combnum[combnum['is_train']==False]

 et = ExtraTreesClassifier(n_estimators=200, max_depth=None, min_samples_split=10, random_state=0)
 min_samples_split=10, random_state=0  )

 labels = train[list(label_columns)].values
 tlabels = test[list(label_columns)].values

 features = train[list(columns)].values
 tfeatures = test[list(columns)].values

 et_score = cross_val_score(et, features, labels, n_jobs=-1)
 print("{0} -> ET: {1})".format(label_columns, et_score))

Checking the shape of the arrays:

检查数组的形状：

 features.shape
 Out[19]:(43069, 34)

And

和

labels.shape
Out[20]:(43069, 1)

and I'm getting:

我得到：

IndexError: too many indices for array

and this relevant part of the traceback:

以及回溯的相关部分：

---> 22 et_score = cross_val_score(et, features, labels, n_jobs=-1)

I'm creating the data from Pandas dataframes and I searched here and saw some reference to possible errors via this method but can't figure out how to correct? What the data arrays look like: features

我正在从 Pandas 数据帧创建数据，我在这里搜索并看到了一些通过这种方法可能出现的错误的参考，但不知道如何纠正？数据数组的样子：特征

Out[21]:
array([[ 0.,  1.,  1., ...,  0.,  0.,  1.],
   [ 0.,  1.,  1., ...,  0.,  0.,  1.],
   [ 1.,  1.,  1., ...,  0.,  0.,  1.],
   ..., 
   [ 0.,  0.,  1., ...,  0.,  0.,  1.],
   [ 0.,  0.,  1., ...,  0.,  0.,  1.],
   [ 0.,  0.,  1., ...,  0.,  0.,  1.]])

labels

标签

Out[22]:
array([[1],
   [1],
   [1],
   ..., 
   [1],
   [1],
   [1]])

Answer 1

回答by YE LIANG HARRY

When we do cross validation in scikit-learn, the process requires an (R,)shape label instead of (R,1). Although they are the same thing to some extend, their indexing mechanisms are different. So in your case, just add:

当我们在 scikit-learn 中进行交叉验证时，该过程需要一个(R,)形状标签而不是(R,1)。尽管它们在某种程度上是相同的，但它们的索引机制是不同的。所以在你的情况下，只需添加：

c, r = labels.shape
labels = labels.reshape(c,)

before passing it to the cross-validation function.

在将其传递给交叉验证函数之前。

Answer 2

回答by Bud

It seems to be fixable if you specify the target labels as a single data column from Pandas. If the target has multiple columns, I get a similar error. For example try:

如果您将目标标签指定为 Pandas 的单个数据列，这似乎是可以修复的。如果目标有多个列，我会收到类似的错误。例如尝试：

labels = train['Y']

Answer 3

回答by MSalty

Adding .ravel()to the Y/Labels variable passed into the formula helped solve this problem within KNN as well.

添加.ravel()到传递给公式的 Y/Labels 变量也有助于解决 KNN 中的这个问题。

Answer 4

回答by Yang Zhao

try target:

尝试目标：

y=df['Survived']

instead , i used

相反，我用

y=df[['Survived']]

which made the target y a dateframe, it seems series would be ok

这使目标成为日期框架，看来系列没问题

Answer 5

回答by Gursel Karacor

You might need to play with the dimensions a bit, e.g.

您可能需要稍微调整一下尺寸，例如

et_score = cross_val_score(et, features, labels, n_jobs=-1)[:,n]

or

或者

 et_score = cross_val_score(et, features, labels, n_jobs=-1)[n,:]

n being the dimension.

n 是维度。

pandas Scikit-learn cross val score：数组的索引太多

提问by dartdog

回答by YE LIANG HARRY

回答by Bud

回答by MSalty

回答by Yang Zhao

回答by Gursel Karacor

相关推荐

最近更新

标签

pandas Scikit-learn cross val score：数组的索引太多

提问by dartdog

回答by YE LIANG HARRY

回答by Bud

回答by MSalty

回答by Yang Zhao

回答by Gursel Karacor

相关推荐

pandas 根据熊猫中多列中的值从数据框中选择行

Pandas DataFrame.merge MemoryError

HTML 表到 Pandas 表：html 标签内的信息

pandas ValueError：索引包含重复条目，无法重塑

相关推荐

最近更新

标签