在将 Pandas 数据帧列传递给 scikit 学习回归器之前,是否应该以某种方式对其进行转换?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/20868664/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 21:30:23  来源:igfitidea点击:

Should a pandas dataframe column be converted in some way before passing it to a scikit learn regressor?

pandasscikit-learn

提问by user2808117

I have a pandas dataframe and passing df[list_of_columns]as X and df[[single_column]]as Yto a Random Forest regressor.

我有一个Pandas数据帧和传球df[list_of_columns]为X,df[[single_column]]作为Y一个随机森林回归。

What does the following warnning mean and what should be done to resolve it?

以下警告是什么意思,应该怎么做才能解决?

DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().   probas = cfr.fit(trainset_X, trainset_Y).predict(testset_X)

采纳答案by lejlot

Simply check the shape of your Yvariable, it should be a one-dimensional object, and you are probably passing something with more (possibly trivial) dimensions. Reshape it to the form of list/1d array.

只需检查Y变量的形状,它应该是一维对象,并且您可能正在传递具有更多(可能是微不足道的)维度的东西。将其重塑为列表/一维数组的形式。

回答by Matt

You can use df.single_column.valuesor df['single_column'].valuesto get the underlying numpy array of your series (which, in this case, should also have the correct 1D-shape as mentioned by lejlot).

您可以使用df.single_column.valuesdf['single_column'].values来获取您系列的底层 numpy 数组(在这种情况下,它也应该具有 lejlot 提到的正确一维形状)。

回答by Salvador Dali

Actually the warning tells you exactly what is the problem:

实际上,警告会确切地告诉您问题是什么:

You pass a 2d array which happened to be in the form (X, 1), but the method expects a 1d array and has to be in the form (X, ).

您传递了一个 2d 数组,该数组碰巧在 form 中(X, 1),但该方法需要一个 1d 数组并且必须在 form 中(X, )

Moreover the warning tells you what to do to transform to the form you need: y.values.ravel().

此外,警告会告诉您如何转换为您需要的形式:y.values.ravel().

回答by Dmitriy Biloshytskiy

Use Y = df[[single_column]].values.ravel()solves DataConversionWarningfor me.

为我使用Y = df[[single_column]].values.ravel()解决DataConversionWarning方案。