scala 使用 spark-shell 时使用 sparkConf.set(..) 自定义 SparkContext

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/31397731/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-22 07:20:34  来源:igfitidea点击:

Customize SparkContext using sparkConf.set(..) when using spark-shell

scalaapache-sparkserializationkryo

提问by rake

In Spark, there are 3 primary ways to specify the options for the SparkConfused to create the SparkContext:

在 Spark 中,有 3 种主要方法可以指定SparkConf用于创建 的选项SparkContext

  1. As properties in the conf/spark-defaults.conf
    • e.g., the line: spark.driver.memory 4g
  2. As args to spark-shell or spark-submit
    • e.g., spark-shell --driver-memory 4g ...
  3. In your source code, configuring a SparkConfinstance before using it to create the SparkContext:
    • e.g., sparkConf.set( "spark.driver.memory", "4g" )
  1. 作为 conf/spark-defaults.conf 中的属性
    • 例如,该行: spark.driver.memory 4g
  2. 作为 spark-shell 或 spark-submit 的参数
    • 例如, spark-shell --driver-memory 4g ...
  3. 在您的源代码中,SparkConf在使用实例创建之前配置实例SparkContext
    • 例如, sparkConf.set( "spark.driver.memory", "4g" )

However, when using spark-shell, the SparkContext is already created for you by the time you get a shell prompt, in the variable named sc. When using spark-shell, how do you use option #3 in the list above to set configuration options, if the SparkContext is already created before you have a chance to execute any Scala statements?

但是,在使用 时spark-shell,在您收到 shell 提示时,已经在名为 的变量中为您创建了 SparkContext sc。使用 spark-shell 时,如果在您有机会执行任何 Scala 语句之前已经创建了 SparkContext,您如何使用上面列表中的选项 #3 来设置配置选项?

In particular, I am trying to use Kyro serialization and GraphX. The prescribed way to use Kryo with GraphX is to execute the following Scala statement when customizing the SparkConfinstance:

特别是,我正在尝试使用 Kyro 序列化和 GraphX。将 Kryo 与 GraphX 结合使用的规定方法是在自定义SparkConf实例时执行以下 Scala 语句:

GraphXUtils.registerKryoClasses( sparkConf )

How do I accomplish this when running spark-shell?

运行时如何完成此操作spark-shell

回答by zero323

Spark 2.0+

火花 2.0+

You should be able to use SparkSession.conf.setmethod to set someconfiguration option on runtime but it is mostly limited to SQL configuration.

您应该能够使用SparkSession.conf.setmethod在运行时设置一些配置选项,但它主要限于 SQL 配置。

Spark < 2.0

火花 < 2.0

You can simply stop an existing context and create a new one:

您可以简单地停止现有的上下文并创建一个新的上下文:

import org.apache.spark.{SparkContext, SparkConf}

sc.stop()
val conf = new SparkConf().set("spark.executor.memory", "4g")
val sc = new SparkContext(conf)

As you can read in the official documentation:

正如您在官方文档中所读到的:

once a SparkConf object is passed to Spark, it is cloned and can no longer be modified by the user. Spark does not support modifying the configuration at runtime.

一旦 SparkConf 对象被传递给 Spark,它就会被克隆并且不能再被用户修改。Spark 不支持在运行时修改配置。

So as you can see stopping the context it is the only applicable option once shell has been started.

因此,如您所见,停止上下文是 shell 启动后唯一适用的选项。

You can always use configuration files or --confargument to spark-shellto set required parameters which will be used be the default context. In case of Kryo you should take a look at:

您始终可以使用配置文件或--conf参数来spark-shell设置将用作默认上下文的必需参数。如果是 Kryo,你应该看看:

  • spark.kryo.classesToRegister
  • spark.kryo.registrator
  • spark.kryo.classesToRegister
  • spark.kryo.registrator

See Compression and Serializationin Spark Configuration.

压缩和序列化星火配置