Python RandomForest - 未知标签错误

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/34246336/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 14:41:30  来源:igfitidea点击:

Python RandomForest - Unknown label Error

pythonpython-3.xscikit-learnrandom-forest

提问by Dragonfly

I have trouble using RandomForest fit function

我在使用 RandomForest 拟合函数时遇到问题

This is my training set

这是我的训练集

         P1      Tp1           IrrPOA     Gz          Drz2
0        0.0     7.7           0.0       -1.4        -0.3
1        0.0     7.7           0.0       -1.4        -0.3
2        ...     ...           ...        ...         ...
3        49.4    7.5           0.0       -1.4        -0.3
4        47.4    7.5           0.0       -1.4        -0.3
... (10k rows)

I want to predict P1 thanks to all the other variables using sklearn.ensemble RandomForest

由于使用 sklearn.ensemble RandomForest 的所有其他变量,我想预测 P1

colsRes = ['P1']
X_train = train.drop(colsRes, axis = 1)
Y_train = pd.DataFrame(train[colsRes])
rf = RandomForestClassifier(n_estimators=100)
rf.fit(X_train, Y_train)

Here is the error I get:

这是我得到的错误:

ValueError: Unknown label type: array([[  0. ],
       [  0. ],
       [  0. ],
       ..., 
       [ 49.4],
       [ 47.4],

I did not find anything about this label error, I use Python 3.5. Any advice would be a great help !

我没有发现有关此标签错误的任何信息,我使用的是 Python 3.5。任何建议都会有很大帮助!

采纳答案by Gurupad Hegde

When you are passing label (y) data to rf.fit(X,y), it expects y to be 1D list. Slicing the Panda frame always result in a 2D list. So, conflict raised in your use-case. You need to convert the 2D list provided by pandas DataFrame to a 1D list as expected by fit function.

当您将标签 (y) 数据传递给 时 rf.fit(X,y),它期望 y 是一维列表。对 Panda 框架进行切片总是会产生一个 2D 列表。因此,在您的用例中引发了冲突。您需要将 pandas DataFrame 提供的二维列表转换为 fit 函数预期的一维列表。

Try using 1D list first:

首先尝试使用一维列表:

Y_train = list(train.P1.values)

If this does not solve the problem, you can try with solution mentioned in MultinomialNB error: "Unknown Label Type":

如果这不能解决问题,您可以尝试使用MultinomialNB error: "Unknown Label Type" 中提到的解决方案:

Y_train = np.asarray(train['P1'], dtype="|S6")

So your code becomes,

所以你的代码变成,

colsRes = ['P1']
X_train = train.drop(colsRes, axis = 1)
Y_train = np.asarray(train['P1'], dtype="|S6")
rf = RandomForestClassifier(n_estimators=100)
rf.fit(X_train, Y_train)

回答by N. Wouda

According to this SO post, Classifiers need integer or string labels.

根据this SO post分类器需要整数或字符串标签

You could consider switching to a regression model instead (that mightbetter suit your data, as each datum appears to be a float), like so:

您可以考虑改用回归模型(这可能更适合您的数据,因为每个数据似乎都是一个浮点数),如下所示:

X_train = train.drop('P1', axis=1)
Y_train = train['P1']
rf = RandomForestRegressor(n_estimators=100)
rf.fit(X_train.as_matrix(), Y_train.as_matrix())

回答by RunD.M.C.

may be a tad late to the party but I just got this error and solved it by making sure my y variable was type(int) using

聚会可能有点晚了,但我刚刚收到此错误并通过确保我的 y 变量是 type(int) 来解决它

 y = df['y_variable'].astype(int) 

before doing a train test split, also like others have said you problem seems better fit with a RFReg rather then RF

在进行火车测试拆分之前,也像其他人所说的那样,您的问题似乎更适合 RFReg 而不是 RF