pandas ValueError:不支持连续
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/33047525/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
ValueError: continuous is not supported
提问by Toly
I am using GridSearchCV for cross validation of a linear regression (not a classifier nor a logistic regression).
我正在使用 GridSearchCV 进行线性回归(不是分类器也不是逻辑回归)的交叉验证。
I also use StandardScaler for normalization of X
我还使用 StandardScaler 对 X 进行归一化
My dataframe has 17 features (X) and 5 targets (y) (observations). Around 1150 rows
我的数据框有 17 个特征 (X) 和 5 个目标 (y)(观察)。约 1150 行
I keep getting ValueError: continuous is not supported error message and ran out of options.
我不断收到 ValueError:continuous is not supported 错误消息并且用完了选项。
here is some code (assume all imports are done properly):
这是一些代码(假设所有导入都正确完成):
soilM = pd.read_csv('C:/training.csv', index_col=0)
soilM = getDummiedSoilDepth(soilM) #transform text values in 0 and 1
soilM = soilM.drop('Depth', 1)
soil = soilM.iloc[:,-22:]
X_train, X_test, Ca_train, Ca_test, P_train, P_test, pH_train, pH_test, SOC_train, SOC_test, Sand_train, Sand_test = splitTrainTestAdv(soil)
scores = ['precision', 'recall']
for score in scores:
for model in MODELS.keys():
print model, score
performParameterSelection(model, score, X_test, Ca_test, X_train, Ca_train)
def performParameterSelection(model_name, criteria, X_test, y_test, X_train, y_train):
model, param_grid = MODELS[model_name]
gs = GridSearchCV(model, param_grid, n_jobs= 1, cv=5, verbose=1, scoring='%s_weighted' % criteria)
gs.fit(X_train, y_train)
print(gs.best_params_)
for params, mean_score, scores in gs.grid_scores_:
print("%0.3f (+/-%0.03f) for %r"
% (mean_score, scores.std() * 2, params))
y_true, y_pred = y_test, gs.predict(X_test)
print(classification_report(y_true, y_pred))
MODELS = {
'lasso': (
linear_model.Lasso(),
{'alpha': [0.95]}
),
'ridge': (
linear_model.Ridge(),
{'alpha': [0.01]}
),
'elasticnet': (
linear_model.ElasticNet(),
{
'alpha': [0.6],
'l1_ratio': [0.4]
}
),
'svr': (
svm.SVR(),
{
'C': [5.0],
'epsilon': [0.1],
'kernel': ['linear']
}
)
}
def performLasso(X_train, y_train, X_test, parameter):
alpha = parameter[0]
model = linear_model.Lasso(alpha=alpha, normalize=True) #pass alpha to Lasso
model.fit(X_train, y_train)
return model.predict(X_test)
def splitTrainTestAdv(df):
y = df.iloc[:,-5:].copy() # last 5 columns
X1 = df.iloc[:,:-5].copy() # Except for last 5 columns
Ca = y['Ca'].copy()
P = y['P'].copy()
pH = y['pH'].copy()
SOC = y['SOC'].copy()
Sand = y['Sand'].copy()
#Scaling and Sampling
X = StandardScaler(copy=False).fit_transform(X1)
X_train, X_test, Ca_train, Ca_test = train_test_split(X, Ca, test_size=0.2, random_state=0)
return X_train, X_test, Ca_train, Ca_test, P_train, P_test, pH_train, pH_test, SOC_train, SOC_test, Sand_train, Sand_test
These are the main pieces of the code
这些是代码的主要部分
This is the main part of Error output:
这是错误输出的主要部分:
ValueError Traceback (most recent call last)
<ipython-input-90-1315d47e2551> in <module>()
20 print '####################'
21 print featuresV[1]
---> 22 performParameterSelection(model, score, X_test, Ca_test, X_train, Ca_train)
23 print featuresV[2]
24 performParameterSelection(model, score, X_test, P_test, X_train, P_train)
<ipython-input-41-7075e1a49412> in performParameterSelection(model_name, criteria, X_test, y_test, X_train, y_train)
12 # cv=5 - constant; verbose - keep writing
13
---> 14 gs.fit(X_train, y_train) # Will get grid scores with outputs from ALL models described above
15
16 #pprint(sorted(gs.grid_scores_, key=lambda x: -x.mean_validation_score))
C:\Users\Tony\Anaconda\lib\site-packages\sklearn\grid_search.pyc in fit(self, X, y)
730
731 """
--> 732 return self._fit(X, y, ParameterGrid(self.param_grid))
90 if (y_type not in ["binary", "multiclass", "multilabel-indicator",
91 "multilabel-sequences"]):
---> 92 raise ValueError("{0} is not supported".format(y_type))
93
94 if y_type in ["binary", "multiclass"]:
ValueError: continuous is not supported
Here is some data after using soil.head(15). It does not show all the columns but it should behave in the same way with 8 features instead of 17. As for target: these are the last 5 columns but the code here calculated only one (Ca)
这是使用soil.head(15)后的一些数据。它没有显示所有列,但它的行为方式应该与 8 个特征相同,而不是 17 个。至于目标:这些是最后 5 列,但此处的代码仅计算了一个 (Ca)
BSAN BSAS BSAV CTI ELEV EVI LSTD LSTN REF1 REF2 ... RELI Subsoil Topsoil TMAP TMFI Ca P pH SOC Sand
PIDN
92RkYor6 -0.405797 -0.563636 -0.806271 -0.228241 -0.691982 1.653790 -0.605889 0.627488 -0.856727 0.056586 ... -0.062181 0 1 0.896228 1.651807 -0.394962 0.031291 0.488676 -0.389042 0.630347
nPv9P04t -0.688406 -0.709091 -0.739082 -0.189180 1.185523 0.395773 -0.381748 -0.338928 -0.774545 -0.818182 ... 2.995923 1 0 1.539208 1.618022 -0.460044 -0.366432 -0.549490 0.204798 -1.162260
oCASbXEx -0.623188 -0.654545 -0.727884 -0.155835 0.711136 0.517493 -0.035002 -0.092554 -0.725818 -0.651206 ... -0.300034 1 0 0.286952 0.657765 0.259613 -0.407934 0.591558 -0.529688 -0.793082
xq94dGBz -0.746377 -0.781818 -0.862262 -0.340427 0.791314 0.672741 -0.665032 -0.128613 -0.853091 -0.741187 ... -0.418960 0 1 0.276740 0.678724 -0.467854 -0.245386 -0.577548 -0.428111 -0.130845
GYSYA8Yf -0.862319 -0.836364 -0.783875 -0.020427 4.715590 0.473032 -1.321194 -2.560069 -0.791273 -0.827458 ... 2.299354 1 0 0.583042 1.825040 1.442361 -0.328389 0.797320 -0.443738 -0.892037
G4e9Ahvi -0.710145 -0.736364 -0.727884 -0.175122 -1.003786 0.744898 -0.678329 0.851702 -0.661818 -0.474954 ... -0.300034 1 0 1.544703 1.641861 -0.355335 -0.079380 -0.287610 -0.256209 0.287810
SHU443XO -0.579710 -0.736364 -0.963046 -0.536744 -0.179733 1.793003 -0.914052 0.291898 -0.966545 -0.086271 ... 0.260618 0 1 1.840689 2.223996 -0.499961 0.155796 -0.886192 -0.107749 0.942435
oAeygDKu -0.152174 -0.154545 -0.134378 1.252267 -0.796659 -0.155977 1.309391 0.642680 -0.205818 -0.341373 ... -0.537887 1 0 -0.320335 0.429981 -0.441821 -0.352598 0.339031 -0.826609 1.650344
agBvYkUI -0.724638 -0.790909 -0.839866 0.114245 1.363697 0.726676 -1.687885 0.060034 -0.706909 -0.523191 ... 1.127081 1 0 1.254782 0.972442 -0.505456 -0.345681 -1.774712 0.071966 -1.207931
8ujcZd8d -0.427536 -0.600000 -0.806271 -0.667808 -1.208686 2.008018 -1.276453 1.203854 -0.698182 0.224490 ... 0.107713 0 1 0.288463 0.013744 -0.362277 -0.338764 0.039740 -0.232768 0.451467
hqO5LhmQ -0.644928 -0.690909 -0.772676 -0.195877 1.138753 0.390671 0.145537 -0.544813 -0.722909 -0.729128 ... -0.537887 0 1 0.153926 0.422784 -0.460333 -0.300721 -0.063142 -0.607825 1.208852
QsfH8CWp -0.449275 -0.618182 -0.862262 -0.512923 -0.712027 1.537901 -0.665190 0.595265 -0.884364 -0.103896 ... -0.028203 1 0 0.896228 1.651807 -0.475953 -0.252303 -0.128612 -0.670335 0.786391
5hhEGbrX -0.260870 -0.290909 -0.335946 -0.175122 -0.749889 0.400146 0.299908 0.567983 -0.423273 -0.244898 ... -0.520897 1 0 0.249117 0.907095 -0.142446 -0.397558 0.423206 -0.412483 -0.678903
XlJWsmdz -0.768116 -0.800000 -0.873460 -0.737115 0.682183 1.013848 -1.013065 -0.376346 -0.837818 -0.544527 ... 1.619776 1 0 0.942437 1.482143 -0.358517 1.283256 -0.072494 -0.490620 -0.899649
FY3riRgw -0.818841 -0.863636 -0.873460 -0.739177 1.715590 1.434402 -1.669818 -0.090647 -0.874909 -0.388683 ... 3.182807 0 1 1.254782 0.972442 -0.333063 0.020916 -0.942309 1.314342 -0.690321
15 rows × 22 columns
15 行 × 22 列
回答by Sergey Bushmanov
Your error continuous is not supportedtells me you're trying to do "something" from regression domain on classification domain.
您的错误continuous is not supported告诉我您正在尝试从分类域上的回归域做“某事”。
At least 1 thing captures my eye as your target is regression:
至少有一件事情引起了我的注意,因为您的目标是回归:
scores = ['precision', 'recall']
To start with, both have nothing to do with regression (as @zero323 pointed out in a comment to your question): they are accuracy measures for classification. Try any regression scores that suit your tastes from thissklearn docs page, section "3.3.1.1. Common cases: predefined values"
首先,两者都与回归无关(正如@zero323 在对您的问题的评论中指出的那样):它们是分类的准确性度量。从这个sklearn 文档页面,“3.3.1.1. 常见情况:预定义值”部分尝试任何适合您口味的回归分数
As far as the rest of the code is concerned, I would strongly encourage you to rewrite your code from scratch: chunk for Lasso, chunk for Ridge, chunk for ElasticNet and chunk for SVM (why would you run Ridge and Lasso separately from ElasticNet as they are special cases of ElasticNet???). This will take you no more than 10-15 lines of code. Only after you made it sure all of them execute, optimal hyperparameters are found, and desirable regression metrics are calculated I would attempt optimizing the code and putting everything together in a loop.
就其余代码而言,我强烈建议您从头开始重写您的代码:用于套索的块,用于脊的块,用于 ElasticNet 的块和用于 SVM 的块(为什么您将 Ridge 和 Lasso 与 ElasticNet 分开运行作为它们是 ElasticNet 的特例???)。这将不会超过 10-15 行代码。只有在确定所有这些都执行后,找到最佳超参数,并计算出理想的回归指标,我才会尝试优化代码并将所有内容放在一个循环中。
PS:
PS:
how are these loops supposed to run:
这些循环应该如何运行:
for score in scores:
for model in MODELS.keys():
prior to defining MODELS?
在定义之前MODELS?

