如何修复 sklearn/python 中的“ValueError: Expected 2D array, got 1D array instead”？

Question

提问by Karthik Bhojaraj

I there. I just started with the machine learning with a simple example to try and learn. So, I want to classify the files in my disk based on the file type by making use of a classifier. The code I have written is,

在这里。我刚开始用一个简单的例子来尝试学习机器学习。因此，我想通过使用分类器根据文件类型对磁盘中的文件进行分类。我写的代码是，

import sklearn
import numpy as np


#Importing a local data set from the desktop
import pandas as pd
mydata = pd.read_csv('file_format.csv',skipinitialspace=True)
print mydata


x_train = mydata.script
y_train = mydata.label

#print x_train
#print y_train
x_test = mydata.script

from sklearn import tree
classi = tree.DecisionTreeClassifier()

classi.fit(x_train, y_train)

predictions = classi.predict(x_test)
print predictions

And I am getting the error as,

我得到的错误是，

  script  class  div   label
0       5      6    7    html
1       0      0    0  python
2       1      1    1     csv
Traceback (most recent call last):
  File "newtest.py", line 21, in <module>
  classi.fit(x_train, y_train)
  File "/home/initiouser2/.local/lib/python2.7/site-
packages/sklearn/tree/tree.py", line 790, in fit
    X_idx_sorted=X_idx_sorted)
  File "/home/initiouser2/.local/lib/python2.7/site-
packages/sklearn/tree/tree.py", line 116, in fit
    X = check_array(X, dtype=DTYPE, accept_sparse="csc")
  File "/home/initiouser2/.local/lib/python2.7/site-
packages/sklearn/utils/validation.py", line 410, in check_array
    "if it contains a single sample.".format(array))
ValueError: Expected 2D array, got 1D array instead:
array=[ 5.  0.  1.].
Reshape your data either using array.reshape(-1, 1) if your data has a 
single feature or array.reshape(1, -1) if it contains a single sample.

If anyone can help me with the code, it would be so helpful to me !!

如果有人可以帮助我编写代码，那对我来说会很有帮助！！

Answer 1

回答by cs95

When passing your input to the classifiers, pass 2D arrays(of shape (M, N)where N >= 1), not 1D arrays(which have shape (N,)). The error message is pretty clear,

将输入传递给分类器时，传递 2D 数组（形状为(M, N)N >= 1），而不是一维数组（具有 shape (N,)）。错误信息很清楚，

Reshape your data either using array.reshape(-1, 1)if your data has a single feature or array.reshape(1, -1)if it contains a single sample.

使用array.reshape(-1, 1)数据是否具有单个特征或array.reshape(1, -1)包含单个样本来重塑数据。

from sklearn.model_selection import train_test_split

# X.shape should be (N, M) where M >= 1
X = mydata[['script']]  
# y.shape should be (N, 1)
y = mydata['label'] 
# perform label encoding if "label" contains strings
# y = pd.factorize(mydata['label'])[0].reshape(-1, 1) 
X_train, X_test, y_train, y_test = train_test_split(
                      X, y, test_size=0.33, random_state=42)
...

clf.fit(X_train, y_train) 
print(clf.score(X_test, y_test))

Some other helpful tips -

其他一些有用的提示 -

split your data into valid train and test portions. Do not use your training data to test - that leads to inaccurate estimations of your classifier's strength
I'd recommend factorizing your labels, so you're dealing with integers. It's just easier.

将您的数据拆分为有效的训练和测试部分。不要使用您的训练数据进行测试 - 这会导致对分类器强度的估计不准确
我建议分解你的标签，所以你正在处理整数。这更容易。

Answer 2

回答by Ameya Marathe

X=dataset.iloc[:, 0].values
y=dataset.iloc[:, 1].values

regressor=LinearRegression()
X=X.reshape(-1,1)
regressor.fit(X,y)

I had the following code. The reshape operator is not an inplace operator. So we have to replace it's value by the value after reshaping like given above.

我有以下代码。重塑运算符不是就地运算符。所以我们必须用上面给出的整形后的值替换它的值。

如何修复 sklearn/python 中的“ValueError: Expected 2D array, got 1D array instead”？

提问by Karthik Bhojaraj

回答by cs95

回答by Ameya Marathe

相关推荐

最近更新

标签

如何修复 sklearn/python 中的“ValueError: Expected 2D array, got 1D array instead”？

提问by Karthik Bhojaraj

回答by cs95

回答by Ameya Marathe

相关推荐

Python '没有找到应用程序。要么在视图函数内工作，要么推送应用程序上下文。

Python 如何使用 TensorFlow 获得稳定结果，设置随机种子

如何在 Python OpenCV 中读取图像

Python 如何在 Jupyter 笔记本中以编程方式生成降价输出？

相关推荐

最近更新

标签