pandas ValueError：不支持连续

Question

提问by Toly

I am using GridSearchCV for cross validation of a linear regression (not a classifier nor a logistic regression).

我正在使用 GridSearchCV 进行线性回归（不是分类器也不是逻辑回归）的交叉验证。

I also use StandardScaler for normalization of X

我还使用 StandardScaler 对 X 进行归一化

My dataframe has 17 features (X) and 5 targets (y) (observations). Around 1150 rows

我的数据框有 17 个特征 (X) 和 5 个目标 (y)（观察）。约 1150 行

I keep getting ValueError: continuous is not supported error message and ran out of options.

我不断收到 ValueError：continuous is not supported 错误消息并且用完了选项。

here is some code (assume all imports are done properly):

这是一些代码（假设所有导入都正确完成）：

soilM = pd.read_csv('C:/training.csv', index_col=0)
soilM = getDummiedSoilDepth(soilM) #transform text values in 0 and 1

soilM = soilM.drop('Depth', 1) 

soil = soilM.iloc[:,-22:]

X_train, X_test, Ca_train, Ca_test, P_train, P_test, pH_train, pH_test, SOC_train, SOC_test, Sand_train, Sand_test = splitTrainTestAdv(soil)

scores = ['precision', 'recall']


for score in scores:

    for model in MODELS.keys():

        print model, score

        performParameterSelection(model, score, X_test, Ca_test, X_train, Ca_train)

def performParameterSelection(model_name, criteria, X_test, y_test, X_train, y_train):

    model, param_grid = MODELS[model_name]
    gs = GridSearchCV(model, param_grid, n_jobs= 1, cv=5, verbose=1, scoring='%s_weighted' % criteria)

    gs.fit(X_train, y_train) 

    print(gs.best_params_)

    for params, mean_score, scores in gs.grid_scores_:
        print("%0.3f (+/-%0.03f) for %r"
          % (mean_score, scores.std() * 2, params))


    y_true, y_pred = y_test, gs.predict(X_test)
    print(classification_report(y_true, y_pred))


MODELS = {
    'lasso': (
        linear_model.Lasso(),
        {'alpha': [0.95]}
    ),
    'ridge': (
        linear_model.Ridge(),
        {'alpha': [0.01]}
    ),
    'elasticnet': (
        linear_model.ElasticNet(),
        {
            'alpha': [0.6],
            'l1_ratio': [0.4]
        }
    ),
    'svr': (
        svm.SVR(),
        {
            'C': [5.0],
            'epsilon': [0.1],
            'kernel': ['linear']
        }
    )
 }


def performLasso(X_train, y_train, X_test, parameter):

     alpha = parameter[0]

    model = linear_model.Lasso(alpha=alpha, normalize=True) #pass alpha to Lasso
    model.fit(X_train, y_train)



    return model.predict(X_test)

def splitTrainTestAdv(df):


    y = df.iloc[:,-5:].copy()  # last 5 columns
    X1 = df.iloc[:,:-5].copy()  # Except for last 5 columns

    Ca = y['Ca'].copy()
    P = y['P'].copy()
    pH = y['pH'].copy()
    SOC = y['SOC'].copy()
    Sand = y['Sand'].copy()


    #Scaling and Sampling

    X = StandardScaler(copy=False).fit_transform(X1)

    X_train, X_test, Ca_train, Ca_test = train_test_split(X, Ca, test_size=0.2, random_state=0)


    return X_train, X_test, Ca_train, Ca_test, P_train, P_test, pH_train, pH_test, SOC_train, SOC_test, Sand_train, Sand_test

These are the main pieces of the code

这些是代码的主要部分

This is the main part of Error output:

这是错误输出的主要部分：

ValueError                                Traceback (most recent call last)
<ipython-input-90-1315d47e2551> in <module>()
     20         print '####################'
     21         print featuresV[1]
---> 22         performParameterSelection(model, score, X_test, Ca_test,  X_train, Ca_train)
     23         print featuresV[2]
     24         performParameterSelection(model, score, X_test, P_test, X_train, P_train)

<ipython-input-41-7075e1a49412> in performParameterSelection(model_name, criteria, X_test, y_test, X_train, y_train)
     12     # cv=5 - constant; verbose - keep writing
     13 
---> 14     gs.fit(X_train, y_train) # Will get grid scores with outputs from ALL models described above
     15 
     16         #pprint(sorted(gs.grid_scores_, key=lambda x: -x.mean_validation_score))

C:\Users\Tony\Anaconda\lib\site-packages\sklearn\grid_search.pyc in fit(self, X, y)
    730 
    731         """
--> 732         return self._fit(X, y, ParameterGrid(self.param_grid))



     90     if (y_type not in ["binary", "multiclass", "multilabel-indicator",
     91                        "multilabel-sequences"]):
---> 92         raise ValueError("{0} is not supported".format(y_type))
     93 
     94     if y_type in ["binary", "multiclass"]:

 ValueError: continuous is not supported

Here is some data after using soil.head(15). It does not show all the columns but it should behave in the same way with 8 features instead of 17. As for target: these are the last 5 columns but the code here calculated only one (Ca)

这是使用soil.head(15)后的一些数据。它没有显示所有列，但它的行为方式应该与 8 个特征相同，而不是 17 个。至于目标：这些是最后 5 列，但此处的代码仅计算了一个 (Ca)

    BSAN    BSAS    BSAV    CTI ELEV    EVI LSTD    LSTN    REF1    REF2    ... RELI    Subsoil Topsoil TMAP    TMFI    Ca  P   pH  SOC Sand
PIDN                                                                                    
92RkYor6    -0.405797   -0.563636   -0.806271   -0.228241   -0.691982     1.653790  -0.605889   0.627488    -0.856727   0.056586    ... -0.062181   0     1 0.896228    1.651807    -0.394962   0.031291    0.488676    -0.389042   0.630347
nPv9P04t    -0.688406   -0.709091   -0.739082   -0.189180   1.185523    0.395773    -0.381748   -0.338928   -0.774545   -0.818182   ... 2.995923    1   0   1.539208    1.618022    -0.460044   -0.366432   -0.549490   0.204798    -1.162260
oCASbXEx    -0.623188   -0.654545   -0.727884   -0.155835   0.711136    0.517493    -0.035002   -0.092554   -0.725818   -0.651206   ... -0.300034   1   0   0.286952    0.657765    0.259613    -0.407934   0.591558    -0.529688   -0.793082
xq94dGBz    -0.746377   -0.781818   -0.862262   -0.340427   0.791314    0.672741    -0.665032   -0.128613   -0.853091   -0.741187   ... -0.418960   0     1 0.276740    0.678724    -0.467854   -0.245386   -0.577548   -0.428111   -0.130845
GYSYA8Yf    -0.862319   -0.836364   -0.783875   -0.020427   4.715590    0.473032    -1.321194   -2.560069   -0.791273   -0.827458   ... 2.299354    1   0   0.583042    1.825040    1.442361    -0.328389   0.797320    -0.443738   -0.892037
G4e9Ahvi    -0.710145   -0.736364   -0.727884   -0.175122   -1.003786   0.744898    -0.678329   0.851702    -0.661818   -0.474954   ... -0.300034   1   0   1.544703    1.641861    -0.355335   -0.079380   -0.287610   -0.256209   0.287810
SHU443XO    -0.579710   -0.736364   -0.963046   -0.536744   -0.179733   1.793003    -0.914052   0.291898    -0.966545   -0.086271   ... 0.260618    0   1   1.840689    2.223996    -0.499961   0.155796    -0.886192   -0.107749   0.942435
oAeygDKu    -0.152174   -0.154545   -0.134378   1.252267    -0.796659   -0.155977   1.309391    0.642680    -0.205818   -0.341373   ... -0.537887   1   0   -0.320335   0.429981    -0.441821   -0.352598   0.339031    -0.826609   1.650344
agBvYkUI    -0.724638   -0.790909   -0.839866   0.114245    1.363697    0.726676    -1.687885   0.060034    -0.706909   -0.523191   ... 1.127081    1   0   1.254782    0.972442    -0.505456   -0.345681   -1.774712   0.071966    -1.207931
8ujcZd8d    -0.427536   -0.600000   -0.806271   -0.667808   -1.208686   2.008018    -1.276453   1.203854    -0.698182   0.224490    ... 0.107713    0   1   0.288463    0.013744    -0.362277   -0.338764   0.039740    -0.232768   0.451467
hqO5LhmQ    -0.644928   -0.690909   -0.772676   -0.195877   1.138753    0.390671    0.145537    -0.544813   -0.722909   -0.729128   ... -0.537887   0   1   0.153926    0.422784    -0.460333   -0.300721   -0.063142   -0.607825   1.208852
QsfH8CWp    -0.449275   -0.618182   -0.862262   -0.512923   -0.712027   1.537901    -0.665190   0.595265    -0.884364   -0.103896   ... -0.028203   1   0   0.896228    1.651807    -0.475953   -0.252303   -0.128612   -0.670335   0.786391
5hhEGbrX    -0.260870   -0.290909   -0.335946   -0.175122   -0.749889   0.400146    0.299908    0.567983    -0.423273   -0.244898   ... -0.520897   1   0   0.249117    0.907095    -0.142446   -0.397558   0.423206    -0.412483   -0.678903
XlJWsmdz    -0.768116   -0.800000   -0.873460   -0.737115   0.682183    1.013848    -1.013065   -0.376346   -0.837818   -0.544527   ... 1.619776    1   0   0.942437    1.482143    -0.358517   1.283256    -0.072494   -0.490620   -0.899649
FY3riRgw    -0.818841   -0.863636   -0.873460   -0.739177   1.715590    1.434402    -1.669818   -0.090647   -0.874909   -0.388683   ... 3.182807    0   1   1.254782    0.972442    -0.333063   0.020916    -0.942309   1.314342    -0.690321

15 rows × 22 columns

15 行 × 22 列

Answer 1

回答by Sergey Bushmanov

Your error continuous is not supportedtells me you're trying to do "something" from regression domain on classification domain.

您的错误continuous is not supported告诉我您正在尝试从分类域上的回归域做“某事”。

At least 1 thing captures my eye as your target is regression:

至少有一件事情引起了我的注意，因为您的目标是回归：

 scores = ['precision', 'recall']

To start with, both have nothing to do with regression (as @zero323 pointed out in a comment to your question): they are accuracy measures for classification. Try any regression scores that suit your tastes from thissklearn docs page, section "3.3.1.1. Common cases: predefined values"

首先，两者都与回归无关（正如@zero323 在对您的问题的评论中指出的那样）：它们是分类的准确性度量。从这个sklearn 文档页面，“3.3.1.1. 常见情况：预定义值”部分尝试任何适合您口味的回归分数

As far as the rest of the code is concerned, I would strongly encourage you to rewrite your code from scratch: chunk for Lasso, chunk for Ridge, chunk for ElasticNet and chunk for SVM (why would you run Ridge and Lasso separately from ElasticNet as they are special cases of ElasticNet???). This will take you no more than 10-15 lines of code. Only after you made it sure all of them execute, optimal hyperparameters are found, and desirable regression metrics are calculated I would attempt optimizing the code and putting everything together in a loop.

就其余代码而言，我强烈建议您从头开始重写您的代码：用于套索的块，用于脊的块，用于 ElasticNet 的块和用于 SVM 的块（为什么您将 Ridge 和 Lasso 与 ElasticNet 分开运行作为它们是 ElasticNet 的特例？？？）。这将不会超过 10-15 行代码。只有在确定所有这些都执行后，找到最佳超参数，并计算出理想的回归指标，我才会尝试优化代码并将所有内容放在一个循环中。

PS:

PS：

how are these loops supposed to run:

这些循环应该如何运行：

for score in scores:
  for model in MODELS.keys():

prior to defining MODELS?

在定义之前MODELS？

pandas ValueError：不支持连续

提问by Toly

回答by Sergey Bushmanov

相关推荐

最近更新

标签

pandas ValueError：不支持连续

提问by Toly

回答by Sergey Bushmanov

相关推荐

pandas 替换熊猫数据框列中的子字符串

列子集和过滤 Pandas

使用 Pandas 导入多个 SQL 表

Pandas：如何过滤在数据框中出现多次的项目

相关推荐

最近更新

标签