scala 使用 spark-shell 时使用 sparkConf.set(..) 自定义 SparkContext

Question

提问by rake

In Spark, there are 3 primary ways to specify the options for the SparkConfused to create the SparkContext:

在 Spark 中，有 3 种主要方法可以指定SparkConf用于创建的选项SparkContext：

As properties in the conf/spark-defaults.conf
- e.g., the line: spark.driver.memory 4g
As args to spark-shell or spark-submit
- e.g., spark-shell --driver-memory 4g ...
In your source code, configuring a SparkConfinstance before using it to create the SparkContext:
- e.g., sparkConf.set( "spark.driver.memory", "4g" )

作为 conf/spark-defaults.conf 中的属性
- 例如，该行： spark.driver.memory 4g
作为 spark-shell 或 spark-submit 的参数
- 例如， spark-shell --driver-memory 4g ...
在您的源代码中，SparkConf在使用实例创建之前配置实例SparkContext：
- 例如， sparkConf.set( "spark.driver.memory", "4g" )

However, when using spark-shell, the SparkContext is already created for you by the time you get a shell prompt, in the variable named sc. When using spark-shell, how do you use option #3 in the list above to set configuration options, if the SparkContext is already created before you have a chance to execute any Scala statements?

但是，在使用时spark-shell，在您收到 shell 提示时，已经在名为的变量中为您创建了 SparkContext sc。使用 spark-shell 时，如果在您有机会执行任何 Scala 语句之前已经创建了 SparkContext，您如何使用上面列表中的选项 #3 来设置配置选项？

In particular, I am trying to use Kyro serialization and GraphX. The prescribed way to use Kryo with GraphX is to execute the following Scala statement when customizing the SparkConfinstance:

特别是，我正在尝试使用 Kyro 序列化和 GraphX。将 Kryo 与 GraphX 结合使用的规定方法是在自定义SparkConf实例时执行以下 Scala 语句：

GraphXUtils.registerKryoClasses( sparkConf )

How do I accomplish this when running spark-shell?

运行时如何完成此操作spark-shell？

Answer 1

回答by zero323

Spark 2.0+

火花 2.0+

You should be able to use SparkSession.conf.setmethod to set someconfiguration option on runtime but it is mostly limited to SQL configuration.

您应该能够使用SparkSession.conf.setmethod在运行时设置一些配置选项，但它主要限于 SQL 配置。

Spark < 2.0

火花 < 2.0

You can simply stop an existing context and create a new one:

您可以简单地停止现有的上下文并创建一个新的上下文：

import org.apache.spark.{SparkContext, SparkConf}

sc.stop()
val conf = new SparkConf().set("spark.executor.memory", "4g")
val sc = new SparkContext(conf)

As you can read in the official documentation:

正如您在官方文档中所读到的：

once a SparkConf object is passed to Spark, it is cloned and can no longer be modified by the user. Spark does not support modifying the configuration at runtime.

一旦 SparkConf 对象被传递给 Spark，它就会被克隆并且不能再被用户修改。Spark 不支持在运行时修改配置。

So as you can see stopping the context it is the only applicable option once shell has been started.

因此，如您所见，停止上下文是 shell 启动后唯一适用的选项。

You can always use configuration files or --confargument to spark-shellto set required parameters which will be used be the default context. In case of Kryo you should take a look at:

您始终可以使用配置文件或--conf参数来spark-shell设置将用作默认上下文的必需参数。如果是 Kryo，你应该看看：

spark.kryo.classesToRegister
spark.kryo.registrator

spark.kryo.classesToRegister
spark.kryo.registrator

See Compression and Serializationin Spark Configuration.

见压缩和序列化的星火配置。

scala 使用 spark-shell 时使用 sparkConf.set(..) 自定义 SparkContext

提问by rake

回答by zero323

相关推荐

最近更新

标签

scala 使用 spark-shell 时使用 sparkConf.set(..) 自定义 SparkContext

提问by rake

回答by zero323

相关推荐

使用 Spark 和 Scala 计算字数

scala 将 RDD 转换为 JSON 对象

scala Apache Spark，将“CASE WHEN ... ELSE ...”计算列添加到现有数据帧

scala 直接从 Spark shell 读取 ORC 文件

相关推荐

最近更新

标签