在将 Pandas 数据帧列传递给 scikit 学习回归器之前,是否应该以某种方式对其进行转换?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/20868664/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Should a pandas dataframe column be converted in some way before passing it to a scikit learn regressor?
提问by user2808117
I have a pandas dataframe and passing df[list_of_columns]as X and df[[single_column]]as Yto a Random Forest regressor.
我有一个Pandas数据帧和传球df[list_of_columns]为X,df[[single_column]]作为Y一个随机森林回归。
What does the following warnning mean and what should be done to resolve it?
以下警告是什么意思,应该怎么做才能解决?
DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). probas = cfr.fit(trainset_X, trainset_Y).predict(testset_X)
采纳答案by lejlot
Simply check the shape of your Yvariable, it should be a one-dimensional object, and you are probably passing something with more (possibly trivial) dimensions. Reshape it to the form of list/1d array.
只需检查Y变量的形状,它应该是一维对象,并且您可能正在传递具有更多(可能是微不足道的)维度的东西。将其重塑为列表/一维数组的形式。
回答by Matt
You can use df.single_column.valuesor df['single_column'].valuesto get the underlying numpy array of your series (which, in this case, should also have the correct 1D-shape as mentioned by lejlot).
您可以使用df.single_column.values或df['single_column'].values来获取您系列的底层 numpy 数组(在这种情况下,它也应该具有 lejlot 提到的正确一维形状)。
回答by Salvador Dali
Actually the warning tells you exactly what is the problem:
实际上,警告会确切地告诉您问题是什么:
You pass a 2d array which happened to be in the form (X, 1), but the method expects a 1d array and has to be in the form (X, ).
您传递了一个 2d 数组,该数组碰巧在 form 中(X, 1),但该方法需要一个 1d 数组并且必须在 form 中(X, )。
Moreover the warning tells you what to do to transform to the form you need: y.values.ravel().
此外,警告会告诉您如何转换为您需要的形式:y.values.ravel().
回答by Dmitriy Biloshytskiy
Use Y = df[[single_column]].values.ravel()solves DataConversionWarningfor me.
为我使用Y = df[[single_column]].values.ravel()解决DataConversionWarning方案。

