Python sci-kit learn:使用 X.reshape(-1, 1) 重塑数据

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/35166146/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 16:04:43  来源:igfitidea点击:

sci-kit learn: Reshape your data either using X.reshape(-1, 1)

pythonscikit-learn

提问by sareem

I'm training a python (2.7.11) classifier for text classification and while running I'm getting a deprecated warning message that I don't know which line in my code is causing it! The error/warning. However, the code works fine and give me the results...

我正在训练一个用于文本分类的 python (2.7.11) 分类器,在运行时我收到一条已弃用的警告消息,我不知道代码中的哪一行导致了它!错误/警告。但是,代码工作正常并给我结果......

\AppData\Local\Enthought\Canopy\User\lib\site-packages\sklearn\utils\validation.py:386: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and willraise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.

\AppData\Local\Enthought\Canopy\User\lib\site-packages\sklearn\utils\validation.py:386:DeprecationWarning:将一维数组作为数据在 0.17 中被弃用,并会在 0.19 中引发 ValueError。如果您的数据具有单个特征,则使用 X.reshape(-1, 1) 或 X.reshape(1, -1) 如果它包含单个样本来重塑您的数据。

My code:

我的代码:

def main():
    data = []
    folds = 10
    ex = [ [] for x in range(0,10)]
    results = []
    for i,f in enumerate(sys.argv[1:]):
        data.append(csv.DictReader(open(f,'r'),delimiter='\t'))
    for f in data:       
        for i,datum in enumerate(f):
            ex[i % folds].append(datum)
    #print ex
    for held_out in range(0,folds):
        l = []
        cor = []
        l_test = []
        cor_test = []
        vec = []
        vec_test = []

        for i,fold in enumerate(ex):
            for line in fold:
                if i == held_out:
                    l_test.append(line['label'].rstrip("\n"))
                    cor_test.append(line['text'].rstrip("\n"))
                else:
                    l.append(line['label'].rstrip("\n"))
                    cor.append(line['text'].rstrip("\n"))

        vectorizer = CountVectorizer(ngram_range=(1,1),min_df=1)
        X = vectorizer.fit_transform(cor)
        for c in cor:        
            tmp = vectorizer.transform([c]).toarray()
            vec.append(tmp[0])
        for c in cor_test:        
            tmp = vectorizer.transform([c]).toarray()
            vec_test.append(tmp[0])

        clf = MultinomialNB()
        clf .fit(vec,l)
        result = accuracy(l_test,vec_test,clf)
        print result

if __name__ == "__main__":
    main()

Any idea which line raises this warning? Another issue is that running this code with different data sets gives me the same exact accuracy, and I can't figure out what causes this? If I want to use this model in another python process, I looked at the documentation and I found an example of using pickle library, but not for joblib. So, I tried following the same code, but this gave me errors:

知道哪一行会引发此警告吗?另一个问题是,用不同的数据集运行这段代码给了我同样的准确度,我不知道是什么原因造成的?如果我想在另一个python进程中使用这个模型,我查看了文档,我找到了一个使用pickle库的例子,但不是joblib。所以,我尝试遵循相同的代码,但这给了我错误:

clf = joblib.load('model.pkl') 
pred = clf.predict(vec);

Also, if my data is CSV file with this format: "label \t text \n" what should be in the label column in test data?

另外,如果我的数据是具有以下格式的 CSV 文件:“label \t text \n” 测试数据的标签列中应该包含什么?

Thanks in advance

提前致谢

回答by MSeifert

If you want to find out where the Warningis coming from you can temporarly promote Warningsto Exceptions. This will give you a full Traceback and thus the lines where your program encountered the warning.

如果你想知道Warning来自哪里,你可以暂时提升WarningsExceptions. 这会给你一个完整的回溯,因此你的程序遇到警告的行。

with warnings.catch_warnings():
    warnings.simplefilter("error")
    main()

If you run the program from the commandline you can also use the -Wflag. More information on Warning-handling can be found in the python documentation.

如果您从命令行运行程序,您也可以使用该-W标志。有关警告处理的更多信息可以在python 文档中找到。

I know it is only one part of your question I answered but did you debug your code?

我知道这只是我回答的问题的一部分,但是您是否调试了代码?

回答by Ramin Fallahzadeh

It's:

它的:

pred = clf.predict(vec);

I used this in my code and it worked:

我在我的代码中使用了它并且它有效:

#This makes it into a 2d array
temp =  [2 ,70 ,90 ,1] #an instance
temp = np.array(temp).reshape((1, -1))
print(model.predict(temp))

回答by Heavy Breathing

Your 'vec' input into your clf.fit(vec,l).fitneeds to be of type [[]], not just []. This is a quirk that I always forget when I fit models.

您对clf.fit(vec,l).fit需求的“vec”输入类型为[[]],而不仅仅是[]. 这是我在拟合模型时总是忘记的一个怪癖。

Just adding an extra set of square brackets should do the trick!

只需添加一组额外的方括号就可以解决问题!

回答by Bharath

Since 1D array would be deprecated. Try passing 2D array as a parameter. This might help.

因为一维数组将被弃用。尝试将二维数组作为参数传递。这可能会有所帮助。

clf = joblib.load('model.pkl') 
pred = clf.predict([vec]);

回答by Atongsa Miyamoto

2 solution: philosophy___make your data from 1D to 2D

2 解决方案:哲学___让你的数据从一维到二维

  1. Just add: []

    vec = [vec]
    
  2. Reshape your data

    import numpy as np
    vec = np.array(vec).reshape(1, -1)
    
  1. 只需添加: []

    vec = [vec]
    
  2. 重塑您的数据

    import numpy as np
    vec = np.array(vec).reshape(1, -1)
    

回答by Shivprasad Ktheitroadala

Predict method expects 2-d array , you can watch this video , i have also located the exact time https://youtu.be/KjJ7WzEL-es?t=2602.You have to change from [] to [[]].

预测方法需要二维数组,你可以看这个视频,我也找到了准确的时间 https://youtu.be/KjJ7WzEL-es?t=2602。你必须从 [] 更改为 [[]]。