scala 如何减少 Spark 运行时输出的冗长?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/28189408/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-22 06:52:33  来源:igfitidea点击:

How to reduce the verbosity of Spark's runtime output?

scalaapache-spark

提问by newBike

How to reduce the amount of trace info the Spark runtime produces?

如何减少 Spark 运行时产生的跟踪信息量?

The default is too verbose,

默认太冗长,

How to turn off it, and turn on it when I need.

如何关闭它,并在需要时打开它。

Thanks

谢谢

Verbose mode

详细模式

scala> val la = sc.parallelize(List(12,4,5,3,4,4,6,781))
scala> la.collect
15/01/28 09:57:24 INFO SparkContext: Starting job: collect at <console>:15
15/01/28 09:57:24 INFO DAGScheduler: Got job 3 (collect at <console>:15) with 1 output 
...
15/01/28 09:57:24 INFO Executor: Running task 0.0 in stage 3.0 (TID 3)
15/01/28 09:57:24 INFO Executor: Finished task 0.0 in stage 3.0 (TID 3). 626 bytes result sent to driver
15/01/28 09:57:24 INFO DAGScheduler: Stage 3 (collect at <console>:15) finished in 0.002 s
15/01/28 09:57:24 INFO DAGScheduler: Job 3 finished: collect at <console>:15, took 0.020061 s
res5: Array[Int] = Array(12, 4, 5, 3, 4, 4, 6, 781)

Silent mode(expected)

静音模式(预期)

scala> val la = sc.parallelize(List(12,4,5,3,4,4,6,781))
scala> la.collect
res5: Array[Int] = Array(12, 4, 5, 3, 4, 4, 6, 781)

采纳答案by Shyamendra Solanki

quoting from 'Learning Spark' book.

引自《学习火花》一书。

You may find the logging statements that get printed in the shell distracting. You can control the verbosity of the logging. To do this, you can create a file in the conf directory called log4j.properties. The Spark developers already include a template for this file called log4j.properties.template. To make the logging less verbose, make a copy of conf/log4j.properties.templatecalled conf/log4j.propertiesand find the following line:

log4j.rootCategory=INFO, console

Then lower the log level so that we only show WARN message and above by changing it to the following:

log4j.rootCategory=WARN, console

When you re-open the shell, you should see less output.

您可能会发现在 shell 中打印的日志语句令人分心。您可以控制日志记录的详细程度。为此,您可以在 conf 目录中创建一个名为log4j.properties的文件。Spark 开发人员已经为此文件包含了一个名为log4j.properties.template的模板 。为了使记录更简洁,使副本的conf / log4j.properties.template所谓的conf / log4j.properties并找到以下行:

log4j.rootCategory=INFO, console

然后降低日志级别,以便我们通过将其更改为以下内容来仅显示 WARN 消息及更高级别:

log4j.rootCategory=WARN, console

当您重新打开外壳时,您应该看到较少的输出。

回答by user5771281

Spark 1.4.1

火花 1.4.1

sc.setLogLevel("WARN")

From comments in source code:

来自源代码中的注释:

Valid log levels include: ALL, DEBUG, ERROR, FATAL, INFO, OFF, TRACE, WARN

有效的日志级别包括:ALL、DEBUG、ERROR、FATAL、INFO、OFF、TRACE、WARN

Spark 2.x - 2.3.1

火花 2.x - 2.3.1

sparkSession.sparkContext().setLogLevel("WARN")

sparkSession.sparkContext().setLogLevel("WARN")

Spark 2.3.2

火花2.3.2

sparkSession.sparkContext.setLogLevel("WARN")

sparkSession.sparkContext.setLogLevel("WARN")

回答by mrsrinivas

Logging configuration at the Spark app level

Spark 应用程序级别的日志记录配置

With this approach no need of code change in clusterfor a spark application.

使用这种方法,无需在集群中更改Spark 应用程序的代码

  • Let's create a new file log4j.propertiesfrom log4j.properties.template.
  • Then change verbosity with log4j.rootCategoryproperty.
  • Say, we need to check ERRORs of given jar then, log4j.rootCategory=ERROR, console
  • 让我们创建一个新的文件log4j.propertieslog4j.properties.template
  • 然后用log4j.rootCategory属性改变冗长。
  • 说,我们需要检查给定 jar 的ERROR,然后,log4j.rootCategory=ERROR, console

Spark submit command would be

Spark提交命令将是

spark-submit \
    ... #Other spark props goes here    
    --files prop/file/location \
    --conf 'spark.executor.extraJavaOptions=-Dlog4j.configuration=prop/file/location' \
    --conf 'spark.driver.extraJavaOptions=-Dlog4j.configuration=prop/file/location' \
    jar/location \
    [application arguments] 

Now you would see only the logs which are ERROR categorised.

现在您只会看到被 ERROR 分类的日志。



Plain Log4j way wo Spark(but needs code change)

普通 Log4j 方式 wo Spark(但需要更改代码)

Set Logging OFFfor packages organd akka

设置日志记录OFF的包orgakka

import org.apache.log4j.{Level, Logger}

Logger.getLogger("org").setLevel(Level.ERROR)
Logger.getLogger("akka").setLevel(Level.ERROR)

回答by Leo

In Unix you could always pipe stderr to /dev/null, i.e.:

在 Unix 中,您始终可以通过管道将 stderr 传输到/dev/null,即:

run-example org.apache.spark.examples.streaming.NetworkWordCount localhost 9999 2> /dev/null

run-example org.apache.spark.examples.streaming.NetworkWordCount localhost 9999 2> /dev/null