scala 如何从 CrossValidatorModel 中提取最佳参数

Question

提问by Mohammad

I want to find the parameters of ParamGridBuilderthat make the best model in CrossValidator in Spark 1.4.x,

我想ParamGridBuilder在 Spark 1.4.x 的 CrossValidator 中找到使最佳模型的参数，

In Pipeline Examplein Spark documentation, they add different parameters (numFeatures, regParam) by using ParamGridBuilderin the Pipeline. Then by the following line of code they make the best model:

在Spark 文档中的管道示例中，他们通过在管道中使用来添加不同的参数 ( numFeatures, regParam) ParamGridBuilder。然后通过以下代码行，他们制作了最佳模型：

val cvModel = crossval.fit(training.toDF)

Now, I want to know what are the parameters (numFeatures, regParam) from ParamGridBuilderthat produces the best model.

现在，我想知道产生最佳模型的参数 ( numFeatures, regParam)是什么ParamGridBuilder。

I already used the following commands without success:

我已经使用了以下命令但没有成功：

cvModel.bestModel.extractParamMap().toString()
cvModel.params.toList.mkString("(", ",", ")")
cvModel.estimatorParamMaps.toString()
cvModel.explainParams()
cvModel.getEstimatorParamMaps.mkString("(", ",", ")")
cvModel.toString()

Any help?

有什么帮助吗？

Thanks in advance,

提前致谢，

Answer 1

回答by Adam Vogel

One method to get a proper ParamMapobject is to use CrossValidatorModel.avgMetrics: Array[Double]to find the argmax ParamMap:

获取正确ParamMap对象的一种方法是使用CrossValidatorModel.avgMetrics: Array[Double]来查找 argmax ParamMap：

implicit class BestParamMapCrossValidatorModel(cvModel: CrossValidatorModel) {
  def bestEstimatorParamMap: ParamMap = {
    cvModel.getEstimatorParamMaps
           .zip(cvModel.avgMetrics)
           .maxBy(_._2)
           ._1
  }
}

When run on the CrossValidatorModeltrained in the Pipeline Example you cited gives:

当CrossValidatorModel在您引用的管道示例中训练有素时运行时：

scala> println(cvModel.bestEstimatorParamMap)
{
   hashingTF_2b0b8ccaeeec-numFeatures: 100,
   logreg_950a13184247-regParam: 0.1
}

Answer 2

回答by macfeliga

val bestPipelineModel = cvModel.bestModel.asInstanceOf[PipelineModel]
val stages = bestPipelineModel.stages

val hashingStage = stages(1).asInstanceOf[HashingTF]
println("numFeatures = " + hashingStage.getNumFeatures)

val lrStage = stages(2).asInstanceOf[LogisticRegressionModel]
println("regParam = " + lrStage.getRegParam)

source

来源

Answer 3

回答by Algorithman

To print everything in paramMap, you actually don't have to call parent:

要在中打印所有内容paramMap，您实际上不必调用 parent：

cvModel.bestModel.extractParamMap()

To answer OP's question, to get a single best parameter, for example regParam:

要回答 OP 的问题，要获得一个最佳参数，例如regParam：

cvModel.bestModel.extractParamMap().apply(cvModel.bestModel.getParam("regParam"))

Answer 4

回答by Mazen Aly

This is how you get the chosen parameters

这是您获得所选参数的方式

println(cvModel.bestModel.getMaxIter)   
println(cvModel.bestModel.getRegParam)

Answer 5

回答by orangeHIX

this java code should work: cvModel.bestModel().parent().extractParamMap().you can translate it to scala code parent()method will return an estimator, you can get the best params then.

这个java代码应该可以工作：. cvModel.bestModel().parent().extractParamMap()你可以将它翻译成scala代码 parent()方法将返回一个估计器，然后你可以获得最好的参数。

Answer 6

回答by u6020995

This is the ParamGridBuilder()

这是 ParamGridBuilder()

paraGrid = ParamGridBuilder().addGrid(
hashingTF.numFeatures, [10, 100, 1000]
).addGrid(
    lr.regParam, [0.1, 0.01, 0.001]
).build()

There are 3 stages in pipeline. It seems we can assess parameters as the following:

管道中有 3 个阶段。似乎我们可以评估参数如下：

for stage in cv_model.bestModel.stages:
    print 'stages: {}'.format(stage)
    print stage.params
    print '\n'

stage: Tokenizer_46ffb9fac5968c6c152b
[Param(parent='Tokenizer_46ffb9fac5968c6c152b', name='inputCol', doc='input column name'), Param(parent='Tokenizer_46ffb9fac5968c6c152b', name='outputCol', doc='output column name')]

stage: HashingTF_40e1af3ba73764848d43
[Param(parent='HashingTF_40e1af3ba73764848d43', name='inputCol', doc='input column name'), Param(parent='HashingTF_40e1af3ba73764848d43', name='numFeatures', doc='number of features'), Param(parent='HashingTF_40e1af3ba73764848d43', name='outputCol', doc='output column name')]

stage: LogisticRegression_451b8c8dbef84ecab7a9
[]

However, there is no parameter in the last stage, logiscRegression.

但是，最后一个阶段没有参数，logiscRegression。

We can also get weightand interceptparameter from logistregression like the following:

我们还可以从 logistregression 中获取权重和截距参数，如下所示：

cv_model.bestModel.stages[1].getNumFeatures()
10
cv_model.bestModel.stages[2].intercept
1.5791827733883774
cv_model.bestModel.stages[2].weights
DenseVector([-2.5361, -0.9541, 0.4124, 4.2108, 4.4707, 4.9451, -0.3045, 5.4348, -0.1977, -1.8361])

Full exploration: http://kuanliang.github.io/2016-06-07-SparkML-pipeline/

全探索：http: //kuanliang.github.io/2016-06-07-SparkML-pipeline/

Answer 7

回答by Fran?ois

I am working with Spark Scala 1.6.x and here is a full example of how i can set and fit a CrossValidatorand then return the value of the parameter used to get the best model (assuming that training.toDFgives a dataframe ready to be used) :

我正在使用 Spark Scala 1.6.x，这是一个完整的示例，说明如何设置和拟合 a CrossValidator，然后返回用于获得最佳模型的参数值（假设training.toDF提供了一个可供使用的数据帧）：

import org.apache.spark.ml.classification.LogisticRegression
import org.apache.spark.ml.tuning.{CrossValidator, ParamGridBuilder}
import org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator

// Instantiate a LogisticRegression object
val lr = new LogisticRegression()

// Instantiate a ParamGrid with different values for the 'RegParam' parameter of the logistic regression
val paramGrid = new ParamGridBuilder().addGrid(lr.regParam, Array(0.0001, 0.001, 0.01, 0.1, 0.25, 0.5, 0.75, 1)).build()

// Setting and fitting the CrossValidator on the training set, using 'MultiClassClassificationEvaluator' as evaluator
val crossVal = new CrossValidator().setEstimator(lr).setEvaluator(new MulticlassClassificationEvaluator).setEstimatorParamMaps(paramGrid)
val cvModel = crossVal.fit(training.toDF)

// Getting the value of the 'RegParam' used to get the best model
val bestModel = cvModel.bestModel                    // Getting the best model
val paramReference = bestModel.getParam("regParam")  // Getting the reference of the parameter you want (only the reference, not the value)
val paramValue = bestModel.get(paramReference)       // Getting the value of this parameter
print(paramValue)                                    // In my case : 0.001

You can do the same for any parameter or any other type of model.

您可以对任何参数或任何其他类型的模型执行相同的操作。

Answer 8

回答by u10437407

If java，see this debug show;

如果是java，看这个debug show；

bestModel.parent().extractParamMap()

Answer 9

回答by Jorge M. Londo?o P.

Building in the solution of @macfeliga, a single liner that works for pipelines:

构建在@macfeliga 的解决方案中，这是一个适用于管道的单一衬垫：

cvModel.bestModel.asInstanceOf[PipelineModel]
    .stages.foreach(stage => println(stage.extractParamMap))

Answer 10

回答by panc

This SO threadkinda answers the question.

这个 SO 线程有点回答这个问题。

In a nutshell, you need to cast each object to its supposed-to-be class.

简而言之，您需要将每个对象转换为其假定的类。

For the case of CrossValidatorModel, the following is what I did:

对于CrossValidatorModel，以下是我所做的：

import org.apache.spark.ml.tuning.CrossValidatorModel
import org.apache.spark.ml.PipelineModel
import org.apache.spark.ml.regression.RandomForestRegressionModel

// Load CV model from S3
val inputModelPath = "s3://path/to/my/random-forest-regression-cv"
val reloadedCvModel = CrossValidatorModel.load(inputModelPath)

// To get the parameters of the best model
(
    reloadedCvModel.bestModel
        .asInstanceOf[PipelineModel]
        .stages(1)
        .asInstanceOf[RandomForestRegressionModel]
        .extractParamMap()
)

In the example, my pipeline has two stages (a VectorIndexer and a RandomForestRegressor), so the stage index is 1 for my model.

在示例中，我的管道有两个阶段（一个 VectorIndexer 和一个 RandomForestRegressor），所以我的模型的阶段索引为 1。

scala 如何从 CrossValidatorModel 中提取最佳参数

提问by Mohammad

回答by Adam Vogel

回答by macfeliga

回答by Algorithman

回答by Mazen Aly

回答by orangeHIX

回答by u6020995

回答by Fran?ois

回答by u10437407

回答by Jorge M. Londo?o P.

回答by panc

相关推荐

最近更新

标签

scala 如何从 CrossValidatorModel 中提取最佳参数

提问by Mohammad

回答by Adam Vogel

回答by macfeliga

回答by Algorithman

回答by Mazen Aly

回答by orangeHIX

回答by u6020995

回答by Fran?ois

回答by u10437407

回答by Jorge M. Londo?o P.

回答by panc

相关推荐

scala 从案例类中获取字段名称列表

scala Apache Spark 中的 DataFrame 相等性

scala 如何在没有 SQL 查询的情况下使用 Spark Dataframe 检查相等性？

在 Scala 中声明一个全局变量

相关推荐

最近更新

标签