如何修复 sklearn/python 中的“ValueError: Expected 2D array, got 1D array instead”?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/46638641/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 17:44:44  来源:igfitidea点击:

How to fix "ValueError: Expected 2D array, got 1D array instead" in sklearn/python?

pythonarraysnumpyscikit-learn

提问by Karthik Bhojaraj

I there. I just started with the machine learning with a simple example to try and learn. So, I want to classify the files in my disk based on the file type by making use of a classifier. The code I have written is,

在这里。我刚开始用一个简单的例子来尝试学习机器学习。因此,我想通过使用分类器根据文件类型对磁盘中的文件进行分类。我写的代码是,

import sklearn
import numpy as np


#Importing a local data set from the desktop
import pandas as pd
mydata = pd.read_csv('file_format.csv',skipinitialspace=True)
print mydata


x_train = mydata.script
y_train = mydata.label

#print x_train
#print y_train
x_test = mydata.script

from sklearn import tree
classi = tree.DecisionTreeClassifier()

classi.fit(x_train, y_train)

predictions = classi.predict(x_test)
print predictions

And I am getting the error as,

我得到的错误是,

  script  class  div   label
0       5      6    7    html
1       0      0    0  python
2       1      1    1     csv
Traceback (most recent call last):
  File "newtest.py", line 21, in <module>
  classi.fit(x_train, y_train)
  File "/home/initiouser2/.local/lib/python2.7/site-
packages/sklearn/tree/tree.py", line 790, in fit
    X_idx_sorted=X_idx_sorted)
  File "/home/initiouser2/.local/lib/python2.7/site-
packages/sklearn/tree/tree.py", line 116, in fit
    X = check_array(X, dtype=DTYPE, accept_sparse="csc")
  File "/home/initiouser2/.local/lib/python2.7/site-
packages/sklearn/utils/validation.py", line 410, in check_array
    "if it contains a single sample.".format(array))
ValueError: Expected 2D array, got 1D array instead:
array=[ 5.  0.  1.].
Reshape your data either using array.reshape(-1, 1) if your data has a 
single feature or array.reshape(1, -1) if it contains a single sample.

If anyone can help me with the code, it would be so helpful to me !!

如果有人可以帮助我编写代码,那对我来说会很有帮助!!

回答by cs95

When passing your input to the classifiers, pass 2D arrays(of shape (M, N)where N >= 1), not 1D arrays(which have shape (N,)). The error message is pretty clear,

将输入传递给分类器时,传递 2D 数组(形状为(M, N)N >= 1),而不是一维数组(具有 shape (N,))。错误信息很清楚,

Reshape your data either using array.reshape(-1, 1)if your data has a single feature or array.reshape(1, -1)if it contains a single sample.

使用array.reshape(-1, 1)数据是否具有单个特征或array.reshape(1, -1)包含单个样本来重塑数据。

from sklearn.model_selection import train_test_split

# X.shape should be (N, M) where M >= 1
X = mydata[['script']]  
# y.shape should be (N, 1)
y = mydata['label'] 
# perform label encoding if "label" contains strings
# y = pd.factorize(mydata['label'])[0].reshape(-1, 1) 
X_train, X_test, y_train, y_test = train_test_split(
                      X, y, test_size=0.33, random_state=42)
...

clf.fit(X_train, y_train) 
print(clf.score(X_test, y_test))

Some other helpful tips -

其他一些有用的提示 -

  1. split your data into valid train and test portions. Do not use your training data to test - that leads to inaccurate estimations of your classifier's strength
  2. I'd recommend factorizing your labels, so you're dealing with integers. It's just easier.
  1. 将您的数据拆分为有效的训练和测试部分。不要使用您的训练数据进行测试 - 这会导致对分类器强度的估计不准确
  2. 我建议分解你的标签,所以你正在处理整数。这更容易。

回答by Ameya Marathe

X=dataset.iloc[:, 0].values
y=dataset.iloc[:, 1].values

regressor=LinearRegression()
X=X.reshape(-1,1)
regressor.fit(X,y)

I had the following code. The reshape operator is not an inplace operator. So we have to replace it's value by the value after reshaping like given above.

我有以下代码。重塑运算符不是就地运算符。所以我们必须用上面给出的整形后的值替换它的值。