Support Questions

Find answers, ask questions, and share your expertise

[Yarn/Spark] Having a hard time suppressing Spark INFO logs. Have searched here.

avatar
Explorer

I've searched this forum and elsewhere, and there seem to be plenty of ways to do this but none seem to have worked for me.

We recently started using Spark 2.2/Yarn at work and have been having mixed success with it. One troublesome thing is the incredible verbosity at the INFO level, which is where our driver's logs generally are.

This kind of thing:

  • 18/06/15 15:05:03 INFO TaskSetManager: Starting task 39.0 in stage 0.0 (TID 39, company02.mycomp.server.com, executor 1, partition 39, PROCESS_LOCAL, 5311 bytes)
  • 18/06/15 15:05:12 INFO ServerChannelGroup: Connection to /49.70.7.85:48820 accepted at Fri Jun 15 15:05:12 EDT 2018.
  • 18/06/15 15:05:21 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on company02.mycomp.server.com:43587 (size: 60.6 KB, free: 4.1 GB)

I've tried to use this log4j.properties file and pass it to both the driver and the executor, but nothing seems to work:

# Set everything to be logged to the console
log4j.rootCategory=WARN, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.out
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n
# Set the default spark-shell log level to WARN. When running the spark-shell, the
# log level for this class is used to overwrite the root logger's log level, so that
# the user can have different defaults for the shell and regular Spark apps.
log4j.logger.org.apache.spark.repl.Main=WARN
# Settings to quiet third party logs that are too verbose
log4j.logger.com.mycompany=INFO
log4j.logger.org.http4s=INFO
log4j.logger.io.javalin=INFO
log4j.logger.org.spark_project=WARN
log4j.logger.org.spark_project.jetty=WARN
log4j.logger.org.spark_project.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
log4j.logger.org.apache.parquet=WARN
log4j.logger.parquet=WARN
# SPARK-9183: Settings to avoid annoying messages when looking up nonexistent UDFs in SparkSQL with Hive support
log4j.logger.org.apache.hadoop.hive.metastore.RetryingHMSHandler=FATAL
log4j.logger.org.apache.hadoop.hive.ql.exec.FunctionRegistry=ERROR 
9 REPLIES 9

avatar

@Robert Cornell

In my experience the verbosity level in spark 2 has greatly been reduced compared to 1.6 - specially in the interactive interpreters like spark-shell.

.

Please check default log4j in ambari > spark2 > conf and make sure the global log4j file is not setting any loggers to INFO.

.

If you wish to point to a specific log4j file, depending on the master and deployment mode you need to use one or more properties:

#yarn-client mode

bin/spark-submit --master yarn --deploy-mode client --files /path/to/log4j/log4j.properties --conf "spark.executor.extraJavaOptions='-Dlog4j.configuration=log4j.properties'" --driver-java-options "-Dlog4j.configuration=file:/path/to/log4j/log4j.properties" 

#yarn-cluster mode

bin/spark-submit --master yarn --deploy-mode cluster --files /path/to/log4j/log4j.properties --conf "spark.executor.extraJavaOptions='-Dlog4j.configuration=log4j.properties'" --conf "spark.driver.extraJavaOptions='-Dlog4j.configuration=log4j.properties'" 

HTH

*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

avatar
Explorer

You know, I had added the log4j.properties to --files but I don't think I had added it to both driver and executors JavaOptions at the same time. I'll give that a shot. Thanks.

avatar

@Robert Cornell if the above answer helped you please take a moment to login and click the "accept" link on the answer.

avatar
Explorer

Our hortonworks cluster is down at the moment. Once it's up and I can test that this works, I will 🙂

avatar
Explorer

Unfortunately this hasn't resolved the issue. We are still getting huge logs. is the "file:/..." necessary in the driver java options?

--conf spark.executor.extraJavaOptions='-Dlog4j.configuration=config/log4j.properties'

--driver-java-options -Dlog4j.configuration=config/log4j.properties

--files config/log4j.properties

avatar

@Robert Cornell I see you are using a path config in executor extraJavaOptions. This wont work. Please copy my example and use paths only when I use paths - reference file name only without path when I also did so.

HTH

avatar

@Robert Cornell Try this

--conf spark.executor.extraJavaOptions='-Dlog4j.configuration=log4j.properties'
--driver-java-options -Dlog4j.configuration=config/log4j.properties
--files config/log4j.properties

I just removed the directory for the executor.

HTH

*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

avatar

@Robert Cornell, Once you pass a file using --files it will be in the containers current working directory. so no need to provide path in "-Dlog4j.configuration=config/log4j.properties" Instead just pass the file name as provided by @Felix Albani in above comment.

avatar
Explorer

Hi @Sandeep Nemuri. I'm running client mode, so I believe I've followed his instructions correctly. For the driver and executor in client mode, @Felix Albani suggested the following:

--driver-java-options "-Dlog4j.configuration=file:/path/to/log4j/log4j.properties"

--conf "spark.executor.extraJavaOptions='-Dlog4j.configuration=log4j.properties'"

Aside from the --files instruction. I can confirm in the logs that log4j.properties does get uploaded.