scala 如何减少 Spark 运行时输出的冗长?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/28189408/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to reduce the verbosity of Spark's runtime output?
提问by newBike
How to reduce the amount of trace info the Spark runtime produces?
如何减少 Spark 运行时产生的跟踪信息量?
The default is too verbose,
默认太冗长,
How to turn off it, and turn on it when I need.
如何关闭它,并在需要时打开它。
Thanks
谢谢
Verbose mode
详细模式
scala> val la = sc.parallelize(List(12,4,5,3,4,4,6,781))
scala> la.collect
15/01/28 09:57:24 INFO SparkContext: Starting job: collect at <console>:15
15/01/28 09:57:24 INFO DAGScheduler: Got job 3 (collect at <console>:15) with 1 output
...
15/01/28 09:57:24 INFO Executor: Running task 0.0 in stage 3.0 (TID 3)
15/01/28 09:57:24 INFO Executor: Finished task 0.0 in stage 3.0 (TID 3). 626 bytes result sent to driver
15/01/28 09:57:24 INFO DAGScheduler: Stage 3 (collect at <console>:15) finished in 0.002 s
15/01/28 09:57:24 INFO DAGScheduler: Job 3 finished: collect at <console>:15, took 0.020061 s
res5: Array[Int] = Array(12, 4, 5, 3, 4, 4, 6, 781)
Silent mode(expected)
静音模式(预期)
scala> val la = sc.parallelize(List(12,4,5,3,4,4,6,781))
scala> la.collect
res5: Array[Int] = Array(12, 4, 5, 3, 4, 4, 6, 781)
采纳答案by Shyamendra Solanki
quoting from 'Learning Spark' book.
引自《学习火花》一书。
You may find the logging statements that get printed in the shell distracting. You can control the verbosity of the logging. To do this, you can create a file in the conf directory called log4j.properties. The Spark developers already include a template for this file called log4j.properties.template. To make the logging less verbose, make a copy of conf/log4j.properties.templatecalled conf/log4j.propertiesand find the following line:
log4j.rootCategory=INFO, consoleThen lower the log level so that we only show WARN message and above by changing it to the following:
log4j.rootCategory=WARN, consoleWhen you re-open the shell, you should see less output.
您可能会发现在 shell 中打印的日志语句令人分心。您可以控制日志记录的详细程度。为此,您可以在 conf 目录中创建一个名为log4j.properties的文件。Spark 开发人员已经为此文件包含了一个名为log4j.properties.template的模板 。为了使记录更简洁,使副本的conf / log4j.properties.template所谓的conf / log4j.properties并找到以下行:
log4j.rootCategory=INFO, console然后降低日志级别,以便我们通过将其更改为以下内容来仅显示 WARN 消息及更高级别:
log4j.rootCategory=WARN, console当您重新打开外壳时,您应该看到较少的输出。
回答by user5771281
Spark 1.4.1
火花 1.4.1
sc.setLogLevel("WARN")
From comments in source code:
来自源代码中的注释:
Valid log levels include: ALL, DEBUG, ERROR, FATAL, INFO, OFF, TRACE, WARN
有效的日志级别包括:ALL、DEBUG、ERROR、FATAL、INFO、OFF、TRACE、WARN
Spark 2.x - 2.3.1
火花 2.x - 2.3.1
sparkSession.sparkContext().setLogLevel("WARN")
sparkSession.sparkContext().setLogLevel("WARN")
Spark 2.3.2
火花2.3.2
sparkSession.sparkContext.setLogLevel("WARN")
sparkSession.sparkContext.setLogLevel("WARN")
回答by mrsrinivas
Logging configuration at the Spark app level
Spark 应用程序级别的日志记录配置
With this approach no need of code change in clusterfor a spark application.
使用这种方法,无需在集群中更改Spark 应用程序的代码。
- Let's create a new file log4j.propertiesfrom log4j.properties.template.
- Then change verbosity with
log4j.rootCategoryproperty. - Say, we need to check ERRORs of given jar then,
log4j.rootCategory=ERROR, console
- 让我们创建一个新的文件log4j.properties从log4j.properties.template。
- 然后用
log4j.rootCategory属性改变冗长。 - 说,我们需要检查给定 jar 的ERROR,然后,
log4j.rootCategory=ERROR, console
Spark submit command would be
Spark提交命令将是
spark-submit \
... #Other spark props goes here
--files prop/file/location \
--conf 'spark.executor.extraJavaOptions=-Dlog4j.configuration=prop/file/location' \
--conf 'spark.driver.extraJavaOptions=-Dlog4j.configuration=prop/file/location' \
jar/location \
[application arguments]
Now you would see only the logs which are ERROR categorised.
现在您只会看到被 ERROR 分类的日志。
Plain Log4j way wo Spark(but needs code change)
普通 Log4j 方式 wo Spark(但需要更改代码)
Set Logging OFFfor packages organd akka
设置日志记录OFF的包org和akka
import org.apache.log4j.{Level, Logger}
Logger.getLogger("org").setLevel(Level.ERROR)
Logger.getLogger("akka").setLevel(Level.ERROR)
回答by Leo
In Unix you could always pipe stderr to /dev/null, i.e.:
在 Unix 中,您始终可以通过管道将 stderr 传输到/dev/null,即:
run-example org.apache.spark.examples.streaming.NetworkWordCount localhost 9999 2> /dev/null
run-example org.apache.spark.examples.streaming.NetworkWordCount localhost 9999 2> /dev/null

