Python 如何解决“IndexError：数组索引太多”

Question

提问by Sujoy De

My code below is giving me the following error "IndexError: too many indices for array". I am quite new to machine learning so I do not have any idea about how to solve this. Any kind of help would be appreciated.

我下面的代码给了我以下错误“IndexError：数组的索引太多”。我对机器学习很陌生，所以我不知道如何解决这个问题。任何形式的帮助将不胜感激。

train = pandas.read_csv("D:/...input/train.csv")


xTrain = train.iloc[:,0:54]
yTrain = train.iloc[:,54:]


from sklearn.cross_validation import cross_val_score
clf = LogisticRegression(multi_class='multinomial')
scores = cross_val_score(clf, xTrain, yTrain, cv=10, scoring='accuracy')
print('****Results****')
print(scores.mean())

Answer 1

采纳答案by LJ Codes

The error code you're getting is basically saying you've declared contents for your array that don't fit it. I can't see the declaration of your array but I'm assuming it's one dimensional and the program is objecting to you treating it like a 2 dimensional one.

你得到的错误代码基本上是说你已经为你的数组声明了不适合它的内容。我看不到你的数组的声明，但我假设它是一维的，并且程序反对你把它当作二维的。

Just check your declarations are correct and also test the code by printing the values after you've set them to double check they are what you intend them to be.

只需检查您的声明是否正确，并在您设置它们以仔细检查它们是否是您想要的值后通过打印值来测试代码。

There are a few existing questions on this subject already so i'll just link one that might be helpful here: IndexError: too many indices. Numpy Array with 1 row and 2 columns

关于这个主题已经有一些现有的问题，所以我将在这里链接一个可能有帮助的问题： IndexError：索引过多。1 行 2 列的 Numpy 数组

Answer 2

回答by Vetrivel PS

Step by Step Explanation of ML (Machine Learning) Code with Pandas Dataframe :

使用 Pandas 数据框逐步解释 ML（机器学习）代码：

Seperating Predictor and Target Columns into X and y Respectively.
Splitting Training data (X_train,y_train) and Testing Data (X_test,y_test).
Calculating Cross-Validated AUC (Area Under the Curve). Got an Error “IndexError: too many indices for array” due to y_trainsince it was expecting a 1-D Array but Fetched 2-D Array which is a Mismatch. After Replacingthe code 'y_train'with y_train['y']code worked like a Charm.

将预测器和目标列分别分成 X 和 y。
拆分训练数据 (X_train,y_train) 和测试数据 (X_test,y_test)。
计算交叉验证的 AUC（曲线下面积）。由于y_train出现错误“ IndexError：数组的索引太多”，因为它期望一维数组，但获取的二维数组不匹配。后更换代码“y_train”与y_train [“Y”]代码工作就像一个魅力。

   # Importing Packages :

   import pandas as pd

   from sklearn.model_selection import cross_val_score

   from sklearn.model_selection import StratifiedShuffleSplit

   # Seperating Predictor and Target Columns into X and y Respectively :
   # df -> Dataframe extracted from CSV File

   data_X = df.drop(['y'], axis=1) 
   data_y = pd.DataFrame(df['y'])

   # Making a Stratified Shuffle Split of Train and Test Data (test_size=0.3 Denotes 30 % Test Data and Remaining 70% Train Data) :

   rs = StratifiedShuffleSplit(n_splits=2, test_size=0.3,random_state=2)       
   rs.get_n_splits(data_X,data_y)

   for train_index, test_index in rs.split(data_X,data_y):

       # Splitting Training and Testing Data based on Index Values :

       X_train,X_test = data_X.iloc[train_index], data_X.iloc[test_index]
       y_train,y_test = data_y.iloc[train_index], data_y.iloc[test_index]

       # Calculating 5-Fold Cross-Validated AUC (cv=5) - Error occurs due to Dimension of **y_train** in this Line :

       classify_cross_val_score = cross_val_score(classify, X_train, y_train, cv=5, scoring='roc_auc').mean()

       print("Classify_Cross_Val_Score ",classify_cross_val_score) # Error at Previous Line.

       # Worked after Replacing 'y_train' with y_train['y'] in above Line 
       # where y is the ONLY Column (or) Series Present in the Pandas Data frame 
       # (i.e) Target variable for Prediction :

       classify_cross_val_score = cross_val_score(classify, X_train, y_train['y'], cv=5, scoring='roc_auc').mean()

       print("Classify_Cross_Val_Score ",classify_cross_val_score)

       print(y_train.shape)

       print(y_train['y'].shape)

Output :

输出：

    Classify_Cross_Val_Score  0.7021433588790991
    (31647, 1) # 2-D
    (31647,)   # 1-D

Note : from sklearn.model_selection import cross_val_score. cross_val_score has been imported from sklearn.model_selection and NOT from sklearn.cross_validation which is Deprecated.

注意：从 sklearn.model_selection 导入 cross_val_score。cross_val_score 已从 sklearn.model_selection 导入，而不是从已弃用的 sklearn.cross_validation 导入。

Answer 3

回答by TechBomb

You are getting this error because you are making target array 'y' 2-D which is actually needed to be 1-D to pass in cross validation function.

您收到此错误是因为您正在制作目标数组 'y' 2-D，它实际上需要是 1-D 才能传递交叉验证函数。

These two cases are different:

这两种情况是不同的：

1. y=numpy.zeros(shape=(len(list),1))
2. y=numpy.zeros(shape=(len(list)))

If you declare y like case 1 then y becomes 2-D. But you needed a 1-D array, hence, use case 2.

如果你像 case 1 一样声明 y，那么 y 就变成了 2-D。但是您需要一个一维数组，因此，使用案例 2。

Answer 4

回答by user8826621

While importing dataset and printing out with Matplotlib I could preview image with images[5540,:]where 5540 is id of image but while printing label for that image with labels[5540,:]it threw an error like too many Index values.

在导入数据集并使用 Matplotlib 打印时，我可以预览图像，images[5540,:]其中 5540 是图像的 id，但是在打印该图像的标签时labels[5540,:]会引发错误，例如索引值太多。

I found out that labels is only 1D array while I'm trying to print is 2D array so there are less index to return for this statement so it was throwing error.

我发现标签只是一维数组，而我尝试打印的是二维数组，因此该语句返回的索引较少，因此抛出错误。

Solution which worked for me was labels[5540,].

对我有用的解决方案是labels[5540,].

Python 如何解决“IndexError：数组索引太多”

提问by Sujoy De

采纳答案by LJ Codes

回答by Vetrivel PS

Step by Step Explanation of ML (Machine Learning) Code with Pandas Dataframe :

使用 Pandas 数据框逐步解释 ML（机器学习）代码：

Output :

输出：

回答by TechBomb

回答by user8826621

相关推荐

最近更新

标签

Python 如何解决“IndexError：数组索引太多”

提问by Sujoy De

采纳答案by LJ Codes

回答by Vetrivel PS

Step by Step Explanation of ML (Machine Learning) Code with Pandas Dataframe :

使用 Pandas 数据框逐步解释 ML（机器学习）代码：

Output :

输出 ：

回答by TechBomb

回答by user8826621

相关推荐

如何检查变量是python列表、numpy数组还是pandas系列

Python Spark 数据帧随机拆分

Python super(type, obj): obj 必须是 type 的实例或子类型

安装多处理 python3

相关推荐

最近更新

标签

输出：