Python 在 RandomForestRegressor 中出现不支持连续错误

Question

提问by toy

I'm just trying to do a simple RandomForestRegressor example. But while testing the accuracy I get this error

我只是想做一个简单的 RandomForestRegressor 示例。但是在测试准确性时我得到了这个错误

/Users/noppanit/anaconda/lib/python2.7/site-packages/sklearn/metrics/classification.pyc
in accuracy_score(y_true, y_pred, normalize, sample_weight) 177 178 # Compute accuracy for each possible representation --> 179 y_type, y_true, y_pred = _check_targets(y_true, y_pred) 180 if y_type.startswith('multilabel'): 181 differing_labels = count_nonzero(y_true - y_pred, axis=1)
/Users/noppanit/anaconda/lib/python2.7/site-packages/sklearn/metrics/classification.pyc
in _check_targets(y_true, y_pred) 90 if (y_type not in ["binary", "multiclass", "multilabel-indicator", 91 "multilabel-sequences"]): ---> 92 raise ValueError("{0} is not supported".format(y_type)) 93 94 if y_type in ["binary", "multiclass"]:
ValueError: continuous is not supported

/Users/noppanit/anaconda/lib/python2.7/site-packages/sklearn/metrics/classification.pyc
inaccuracy_score(y_true, y_pred, normalize, sample_weight) 177 178 # 计算每个可能表示的准确率 --> 179 y_type, y_true, y_pred = _check_targets(y_true, y_pred) 180 if y_type.startswith:('18labels = different') count_nonzero(y_true - y_pred, 轴 = 1)
/Users/noppanit/anaconda/lib/python2.7/site-packages/sklearn/metrics/classification.pyc
in _check_targets(y_true, y_pred) 90 if (y_type not in ["binary", "multiclass", "multilabel-indicator", 91 "multilabel-sequences"]): ---> 92 raise ValueError("{0} is不支持".format(y_type)) 93 94 如果 y_type 在 ["binary", "multiclass"]:
ValueError: continuous is not supported

This is the sample of the data. I can't show the real data.

这是数据的样本。我无法显示真实数据。

target, func_1, func_2, func_2, ... func_200
float, float, float, float, ... float

Here's my code.

这是我的代码。

import pandas as pd
import numpy as np
from sklearn.preprocessing import Imputer
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor, ExtraTreesRegressor, GradientBoostingRegressor
from sklearn.cross_validation import train_test_split
from sklearn.metrics import accuracy_score
from sklearn import tree

train = pd.read_csv('data.txt', sep='\t')

labels = train.target
train.drop('target', axis=1, inplace=True)
cat = ['cat']
train_cat = pd.get_dummies(train[cat])

train.drop(train[cat], axis=1, inplace=True)
train = np.hstack((train, train_cat))

imp = Imputer(missing_values='NaN', strategy='mean', axis=0)
imp.fit(train)
train = imp.transform(train)

x_train, x_test, y_train, y_test = train_test_split(train, labels.values, test_size = 0.2)

clf = RandomForestRegressor(n_estimators=10)

clf.fit(x_train, y_train)
y_pred = clf.predict(x_test)
accuracy_score(y_test, y_pred) # This is where I get the error.

Answer 1

采纳答案by Ibraim Ganiev

It's because accuracy_scoreis for classification tasks only. For regression you should use something different, for example:

这是因为accuracy_score仅用于分类任务。对于回归，您应该使用不同的东西，例如：

clf.score(X_test, y_test)

Where X_test is samples, y_test is corresponding ground truth values. It will compute predictions inside.

其中 X_test 是样本，y_test 是对应的地面真值。它将在内部计算预测。

Answer 2

回答by ThReSholD

Since you are doing a classification task, you should be using the metric R-squared(co-effecient of determination)instead of accuracy score(accuracy score is used for classification purposes).

由于您正在执行分类任务，因此您应该使用度量R 平方（确定系数）而不是 准确度分数（准确度分数用于分类目的）。

To avoid any confusion I suggest you to use different variable name like reg/rfr.

为避免混淆，我建议您使用不同的变量名称，如 reg/rfr。

R-squared can be computed by calling scorefunction provided by RandomForestRegressor, for example:

R-squared可以通过调用RandomForestRegressor提供的score函数来计算，例如：

rfr.score(X_test,Y_test)

Python 在 RandomForestRegressor 中出现不支持连续错误

提问by toy

采纳答案by Ibraim Ganiev

回答by ThReSholD

相关推荐

最近更新

标签

Python 在 RandomForestRegressor 中出现不支持连续错误

提问by toy

采纳答案by Ibraim Ganiev

回答by ThReSholD

相关推荐

使用 Python 和 Boto3 列出 S3 存储桶的目录内容？

使用 Python 创建二维坐标

Python日期时间添加

Python 为什么我在安装 pip 后立即收到 ImportError: No module named pip ' ？

相关推荐

最近更新

标签