Python sklearn LinearSVC - X 每个样本有 1 个特征;期待 5

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/32106063/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 11:01:21  来源:igfitidea点击:

sklearn LinearSVC - X has 1 features per sample; expecting 5

pythonmachine-learningscikit-learn

提问by Radu Gheorghiu

I'm trying to predict the class of a test array, but I'm getting the below error, along with the stack trace:

我正在尝试预测测试数组的类,但出现以下错误以及堆栈跟踪:

Traceback (most recent call last):
  File "/home/radu/PycharmProjects/Recommender/Temporary/classify_dict_test.py", line 24, in <module>
    print classifier.predict(test)
  File "/home/radu/.local/lib/python2.7/site-packages/sklearn/linear_model/base.py", line 215, in predict
    scores = self.decision_function(X)
  File "/home/radu/.local/lib/python2.7/site-packages/sklearn/linear_model/base.py", line 196, in decision_function
    % (X.shape[1], n_features))
ValueError: X has 1 features per sample; expecting 5

The code which is generating this is:

生成这个的代码是:

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.svm import LinearSVC

corpus = [
    "I am super good with Java and JEE",
    "I am super good with .NET and C#",
    "I am really good with Python and R",
    "I am really good with C++ and pointers"
    ]

classes = ["java developer", ".net developer", "data scientist", "C++ developer"]

test = ["I think I'm a good developer with really good understanding of .NET"]

tvect = TfidfVectorizer(min_df=1, max_df=1)

X = tvect.fit_transform(corpus)

classifier = LinearSVC()
classifier.fit(X, classes)

print classifier.predict(test)

I've tried looking into the LinearSVC documentationfor guidelines or hints as to what might throw this error, but I can't figure it out.

我曾尝试查看LinearSVC 文档以获取有关可能引发此错误的指南或提示,但我无法弄清楚。

Any help is greatly appreciated!

任何帮助是极大的赞赏!

采纳答案by Alexander Bauer

The variable test is a string - the SVC needs a feature vector with the same number of dimensions as X. You have to transform the test string to a feature vector using the same vectorizer instance, before you feed it to the SVC:

变量 test 是一个字符串——SVC 需要一个与 X 维数相同的特征向量。在将它提供给 SVC 之前,您必须使用相同的向量化器实例将测试字符串转换为特征向量:

X_test=tvect.transform(test)
classifier.predict(X_test)