scala 如何减少 Spark 运行时输出的冗长？

Question

提问by newBike

How to reduce the amount of trace info the Spark runtime produces?

如何减少 Spark 运行时产生的跟踪信息量？

The default is too verbose,

默认太冗长，

How to turn off it, and turn on it when I need.

如何关闭它，并在需要时打开它。

Thanks

谢谢

Verbose mode

详细模式

scala> val la = sc.parallelize(List(12,4,5,3,4,4,6,781))
scala> la.collect
15/01/28 09:57:24 INFO SparkContext: Starting job: collect at <console>:15
15/01/28 09:57:24 INFO DAGScheduler: Got job 3 (collect at <console>:15) with 1 output 
...
15/01/28 09:57:24 INFO Executor: Running task 0.0 in stage 3.0 (TID 3)
15/01/28 09:57:24 INFO Executor: Finished task 0.0 in stage 3.0 (TID 3). 626 bytes result sent to driver
15/01/28 09:57:24 INFO DAGScheduler: Stage 3 (collect at <console>:15) finished in 0.002 s
15/01/28 09:57:24 INFO DAGScheduler: Job 3 finished: collect at <console>:15, took 0.020061 s
res5: Array[Int] = Array(12, 4, 5, 3, 4, 4, 6, 781)

Silent mode(expected)

静音模式（预期）

scala> val la = sc.parallelize(List(12,4,5,3,4,4,6,781))
scala> la.collect
res5: Array[Int] = Array(12, 4, 5, 3, 4, 4, 6, 781)

Answer 1

采纳答案by Shyamendra Solanki

quoting from 'Learning Spark' book.

引自《学习火花》一书。

You may find the logging statements that get printed in the shell distracting. You can control the verbosity of the logging. To do this, you can create a file in the conf directory called log4j.properties. The Spark developers already include a template for this file called log4j.properties.template. To make the logging less verbose, make a copy of conf/log4j.properties.templatecalled conf/log4j.propertiesand find the following line:
log4j.rootCategory=INFO, console
Then lower the log level so that we only show WARN message and above by changing it to the following:
log4j.rootCategory=WARN, console
When you re-open the shell, you should see less output.

您可能会发现在 shell 中打印的日志语句令人分心。您可以控制日志记录的详细程度。为此，您可以在 conf 目录中创建一个名为log4j.properties的文件。Spark 开发人员已经为此文件包含了一个名为log4j.properties.template的模板。为了使记录更简洁，使副本的conf / log4j.properties.template所谓的conf / log4j.properties并找到以下行：
log4j.rootCategory=INFO, console
然后降低日志级别，以便我们通过将其更改为以下内容来仅显示 WARN 消息及更高级别：
log4j.rootCategory=WARN, console
当您重新打开外壳时，您应该看到较少的输出。

Answer 2

回答by user5771281

Spark 1.4.1

火花 1.4.1

sc.setLogLevel("WARN")

From comments in source code:

来自源代码中的注释：

Valid log levels include: ALL, DEBUG, ERROR, FATAL, INFO, OFF, TRACE, WARN

有效的日志级别包括：ALL、DEBUG、ERROR、FATAL、INFO、OFF、TRACE、WARN

Spark 2.x - 2.3.1

火花 2.x - 2.3.1

sparkSession.sparkContext().setLogLevel("WARN")

Spark 2.3.2

火花2.3.2

sparkSession.sparkContext.setLogLevel("WARN")

Answer 3

回答by mrsrinivas

Logging configuration at the Spark app level

Spark 应用程序级别的日志记录配置

With this approach no need of code change in clusterfor a spark application.

使用这种方法，无需在集群中更改Spark 应用程序的代码。

Let's create a new file log4j.propertiesfrom log4j.properties.template.
Then change verbosity with log4j.rootCategoryproperty.
Say, we need to check ERRORs of given jar then, log4j.rootCategory=ERROR, console

让我们创建一个新的文件log4j.properties从log4j.properties.template。
然后用log4j.rootCategory属性改变冗长。
说，我们需要检查给定 jar 的ERROR，然后，log4j.rootCategory=ERROR, console

Spark submit command would be

Spark提交命令将是

spark-submit \
    ... #Other spark props goes here    
    --files prop/file/location \
    --conf 'spark.executor.extraJavaOptions=-Dlog4j.configuration=prop/file/location' \
    --conf 'spark.driver.extraJavaOptions=-Dlog4j.configuration=prop/file/location' \
    jar/location \
    [application arguments]

Now you would see only the logs which are ERROR categorised.

现在您只会看到被 ERROR 分类的日志。

Plain Log4j way wo Spark(but needs code change)

普通 Log4j 方式 wo Spark（但需要更改代码）

Set Logging OFFfor packages organd akka

设置日志记录OFF的包org和akka

import org.apache.log4j.{Level, Logger}

Logger.getLogger("org").setLevel(Level.ERROR)
Logger.getLogger("akka").setLevel(Level.ERROR)

Answer 4

回答by Leo

In Unix you could always pipe stderr to /dev/null, i.e.:

在 Unix 中，您始终可以通过管道将 stderr 传输到/dev/null，即：

run-example org.apache.spark.examples.streaming.NetworkWordCount localhost 9999 2> /dev/null

scala 如何减少 Spark 运行时输出的冗长？

提问by newBike

Verbose mode

详细模式

Silent mode(expected)

静音模式（预期）

采纳答案by Shyamendra Solanki

回答by user5771281

Spark 1.4.1

火花 1.4.1

Spark 2.x - 2.3.1

火花 2.x - 2.3.1

Spark 2.3.2

火花2.3.2

回答by mrsrinivas

Logging configuration at the Spark app level

Spark 应用程序级别的日志记录配置

Spark submit command would be

Spark提交命令将是

Plain Log4j way wo Spark(but needs code change)

普通 Log4j 方式 wo Spark（但需要更改代码）

回答by Leo

相关推荐

最近更新

标签

scala 如何减少 Spark 运行时输出的冗长？

提问by newBike

Verbose mode

详细模式

Silent mode(expected)

静音模式（预期）

采纳答案by Shyamendra Solanki

回答by user5771281

Spark 1.4.1

火花 1.4.1

Spark 2.x - 2.3.1

火花 2.x - 2.3.1

Spark 2.3.2

火花2.3.2

回答by mrsrinivas

Logging configuration at the Spark app level

Spark 应用程序级别的日志记录配置

Spark submit command would be

Spark提交命令将是

Plain Log4j way wo Spark(but needs code change)

普通 Log4j 方式 wo Spark（但需要更改代码）

回答by Leo

相关推荐

使用 scala 在 Apache spark 中连接不同 RDD 的数据集

scala 在 Apache Spark 中传递参数

scala 带参数的Scala传递函数

scala 使用/不使用 Spark SQL 连接两个普通 RDD

相关推荐

最近更新

标签