scala 如何从 CrossValidatorModel 中提取最佳参数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/31749593/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to extract best parameters from a CrossValidatorModel
提问by Mohammad
I want to find the parameters of ParamGridBuilderthat make the best model in CrossValidator in Spark 1.4.x,
我想ParamGridBuilder在 Spark 1.4.x 的 CrossValidator 中找到使最佳模型的参数,
In Pipeline Examplein Spark documentation, they add different parameters (numFeatures, regParam) by using ParamGridBuilderin the Pipeline. Then by the following line of code they make the best model:
在Spark 文档中的管道示例中,他们通过在管道中使用来添加不同的参数 ( numFeatures, regParam) ParamGridBuilder。然后通过以下代码行,他们制作了最佳模型:
val cvModel = crossval.fit(training.toDF)
Now, I want to know what are the parameters (numFeatures, regParam) from ParamGridBuilderthat produces the best model.
现在,我想知道产生最佳模型的参数 ( numFeatures, regParam)是什么ParamGridBuilder。
I already used the following commands without success:
我已经使用了以下命令但没有成功:
cvModel.bestModel.extractParamMap().toString()
cvModel.params.toList.mkString("(", ",", ")")
cvModel.estimatorParamMaps.toString()
cvModel.explainParams()
cvModel.getEstimatorParamMaps.mkString("(", ",", ")")
cvModel.toString()
Any help?
有什么帮助吗?
Thanks in advance,
提前致谢,
回答by Adam Vogel
One method to get a proper ParamMapobject is to use CrossValidatorModel.avgMetrics: Array[Double]to find the argmax ParamMap:
获取正确ParamMap对象的一种方法是使用CrossValidatorModel.avgMetrics: Array[Double]来查找 argmax ParamMap:
implicit class BestParamMapCrossValidatorModel(cvModel: CrossValidatorModel) {
def bestEstimatorParamMap: ParamMap = {
cvModel.getEstimatorParamMaps
.zip(cvModel.avgMetrics)
.maxBy(_._2)
._1
}
}
When run on the CrossValidatorModeltrained in the Pipeline Example you cited gives:
当CrossValidatorModel在您引用的管道示例中训练有素时运行时:
scala> println(cvModel.bestEstimatorParamMap)
{
hashingTF_2b0b8ccaeeec-numFeatures: 100,
logreg_950a13184247-regParam: 0.1
}
回答by macfeliga
val bestPipelineModel = cvModel.bestModel.asInstanceOf[PipelineModel]
val stages = bestPipelineModel.stages
val hashingStage = stages(1).asInstanceOf[HashingTF]
println("numFeatures = " + hashingStage.getNumFeatures)
val lrStage = stages(2).asInstanceOf[LogisticRegressionModel]
println("regParam = " + lrStage.getRegParam)
回答by Algorithman
To print everything in paramMap, you actually don't have to call parent:
要在 中打印所有内容paramMap,您实际上不必调用 parent:
cvModel.bestModel.extractParamMap()
To answer OP's question, to get a single best parameter, for example regParam:
要回答 OP 的问题,要获得一个最佳参数,例如regParam:
cvModel.bestModel.extractParamMap().apply(cvModel.bestModel.getParam("regParam"))
回答by Mazen Aly
This is how you get the chosen parameters
这是您获得所选参数的方式
println(cvModel.bestModel.getMaxIter)
println(cvModel.bestModel.getRegParam)
回答by orangeHIX
this java code should work:
cvModel.bestModel().parent().extractParamMap().you can translate it to scala code
parent()method will return an estimator, you can get the best params then.
这个java代码应该可以工作:.
cvModel.bestModel().parent().extractParamMap()你可以将它翻译成scala代码
parent()方法将返回一个估计器,然后你可以获得最好的参数。
回答by u6020995
This is the ParamGridBuilder()
这是 ParamGridBuilder()
paraGrid = ParamGridBuilder().addGrid(
hashingTF.numFeatures, [10, 100, 1000]
).addGrid(
lr.regParam, [0.1, 0.01, 0.001]
).build()
There are 3 stages in pipeline. It seems we can assess parameters as the following:
管道中有 3 个阶段。似乎我们可以评估参数如下:
for stage in cv_model.bestModel.stages:
print 'stages: {}'.format(stage)
print stage.params
print '\n'
stage: Tokenizer_46ffb9fac5968c6c152b
[Param(parent='Tokenizer_46ffb9fac5968c6c152b', name='inputCol', doc='input column name'), Param(parent='Tokenizer_46ffb9fac5968c6c152b', name='outputCol', doc='output column name')]
stage: HashingTF_40e1af3ba73764848d43
[Param(parent='HashingTF_40e1af3ba73764848d43', name='inputCol', doc='input column name'), Param(parent='HashingTF_40e1af3ba73764848d43', name='numFeatures', doc='number of features'), Param(parent='HashingTF_40e1af3ba73764848d43', name='outputCol', doc='output column name')]
stage: LogisticRegression_451b8c8dbef84ecab7a9
[]
However, there is no parameter in the last stage, logiscRegression.
但是,最后一个阶段没有参数,logiscRegression。
We can also get weightand interceptparameter from logistregression like the following:
我们还可以从 logistregression 中获取权重和截距参数,如下所示:
cv_model.bestModel.stages[1].getNumFeatures()
10
cv_model.bestModel.stages[2].intercept
1.5791827733883774
cv_model.bestModel.stages[2].weights
DenseVector([-2.5361, -0.9541, 0.4124, 4.2108, 4.4707, 4.9451, -0.3045, 5.4348, -0.1977, -1.8361])
Full exploration: http://kuanliang.github.io/2016-06-07-SparkML-pipeline/
全探索:http: //kuanliang.github.io/2016-06-07-SparkML-pipeline/
回答by Fran?ois
I am working with Spark Scala 1.6.x and here is a full example of how i can set and fit a CrossValidatorand then return the value of the parameter used to get the best model (assuming that training.toDFgives a dataframe ready to be used) :
我正在使用 Spark Scala 1.6.x,这是一个完整的示例,说明如何设置和拟合 a CrossValidator,然后返回用于获得最佳模型的参数值(假设training.toDF提供了一个可供使用的数据帧):
import org.apache.spark.ml.classification.LogisticRegression
import org.apache.spark.ml.tuning.{CrossValidator, ParamGridBuilder}
import org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator
// Instantiate a LogisticRegression object
val lr = new LogisticRegression()
// Instantiate a ParamGrid with different values for the 'RegParam' parameter of the logistic regression
val paramGrid = new ParamGridBuilder().addGrid(lr.regParam, Array(0.0001, 0.001, 0.01, 0.1, 0.25, 0.5, 0.75, 1)).build()
// Setting and fitting the CrossValidator on the training set, using 'MultiClassClassificationEvaluator' as evaluator
val crossVal = new CrossValidator().setEstimator(lr).setEvaluator(new MulticlassClassificationEvaluator).setEstimatorParamMaps(paramGrid)
val cvModel = crossVal.fit(training.toDF)
// Getting the value of the 'RegParam' used to get the best model
val bestModel = cvModel.bestModel // Getting the best model
val paramReference = bestModel.getParam("regParam") // Getting the reference of the parameter you want (only the reference, not the value)
val paramValue = bestModel.get(paramReference) // Getting the value of this parameter
print(paramValue) // In my case : 0.001
You can do the same for any parameter or any other type of model.
您可以对任何参数或任何其他类型的模型执行相同的操作。
回答by u10437407
回答by Jorge M. Londo?o P.
Building in the solution of @macfeliga, a single liner that works for pipelines:
构建在@macfeliga 的解决方案中,这是一个适用于管道的单一衬垫:
cvModel.bestModel.asInstanceOf[PipelineModel]
.stages.foreach(stage => println(stage.extractParamMap))
回答by panc
This SO threadkinda answers the question.
这个 SO 线程有点回答这个问题。
In a nutshell, you need to cast each object to its supposed-to-be class.
简而言之,您需要将每个对象转换为其假定的类。
For the case of CrossValidatorModel, the following is what I did:
对于CrossValidatorModel,以下是我所做的:
import org.apache.spark.ml.tuning.CrossValidatorModel
import org.apache.spark.ml.PipelineModel
import org.apache.spark.ml.regression.RandomForestRegressionModel
// Load CV model from S3
val inputModelPath = "s3://path/to/my/random-forest-regression-cv"
val reloadedCvModel = CrossValidatorModel.load(inputModelPath)
// To get the parameters of the best model
(
reloadedCvModel.bestModel
.asInstanceOf[PipelineModel]
.stages(1)
.asInstanceOf[RandomForestRegressionModel]
.extractParamMap()
)
In the example, my pipeline has two stages (a VectorIndexer and a RandomForestRegressor), so the stage index is 1 for my model.
在示例中,我的管道有两个阶段(一个 VectorIndexer 和一个 RandomForestRegressor),所以我的模型的阶段索引为 1。


