Created 06-15-2018 07:15 PM
I've searched this forum and elsewhere, and there seem to be plenty of ways to do this but none seem to have worked for me.
We recently started using Spark 2.2/Yarn at work and have been having mixed success with it. One troublesome thing is the incredible verbosity at the INFO level, which is where our driver's logs generally are.
This kind of thing:
I've tried to use this log4j.properties file and pass it to both the driver and the executor, but nothing seems to work:
# Set everything to be logged to the console log4j.rootCategory=WARN, console log4j.appender.console=org.apache.log4j.ConsoleAppender log4j.appender.console.target=System.out log4j.appender.console.layout=org.apache.log4j.PatternLayout log4j.appender.console.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n # Set the default spark-shell log level to WARN. When running the spark-shell, the # log level for this class is used to overwrite the root logger's log level, so that # the user can have different defaults for the shell and regular Spark apps. log4j.logger.org.apache.spark.repl.Main=WARN # Settings to quiet third party logs that are too verbose log4j.logger.com.mycompany=INFO log4j.logger.org.http4s=INFO log4j.logger.io.javalin=INFO log4j.logger.org.spark_project=WARN log4j.logger.org.spark_project.jetty=WARN log4j.logger.org.spark_project.jetty.util.component.AbstractLifeCycle=ERROR log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO log4j.logger.org.apache.parquet=WARN log4j.logger.parquet=WARN # SPARK-9183: Settings to avoid annoying messages when looking up nonexistent UDFs in SparkSQL with Hive support log4j.logger.org.apache.hadoop.hive.metastore.RetryingHMSHandler=FATAL log4j.logger.org.apache.hadoop.hive.ql.exec.FunctionRegistry=ERROR
Created 06-16-2018 01:11 PM
In my experience the verbosity level in spark 2 has greatly been reduced compared to 1.6 - specially in the interactive interpreters like spark-shell.
.
Please check default log4j in ambari > spark2 > conf and make sure the global log4j file is not setting any loggers to INFO.
.
If you wish to point to a specific log4j file, depending on the master and deployment mode you need to use one or more properties:
#yarn-client mode
bin/spark-submit --master yarn --deploy-mode client --files /path/to/log4j/log4j.properties --conf "spark.executor.extraJavaOptions='-Dlog4j.configuration=log4j.properties'" --driver-java-options "-Dlog4j.configuration=file:/path/to/log4j/log4j.properties"
#yarn-cluster mode
bin/spark-submit --master yarn --deploy-mode cluster --files /path/to/log4j/log4j.properties --conf "spark.executor.extraJavaOptions='-Dlog4j.configuration=log4j.properties'" --conf "spark.driver.extraJavaOptions='-Dlog4j.configuration=log4j.properties'"
HTH
*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.
Created 06-17-2018 04:37 AM
You know, I had added the log4j.properties to --files but I don't think I had added it to both driver and executors JavaOptions at the same time. I'll give that a shot. Thanks.
Created 06-18-2018 01:36 PM
@Robert Cornell if the above answer helped you please take a moment to login and click the "accept" link on the answer.
Created 06-18-2018 01:39 PM
Our hortonworks cluster is down at the moment. Once it's up and I can test that this works, I will 🙂
Created 06-20-2018 09:11 PM
Unfortunately this hasn't resolved the issue. We are still getting huge logs. is the "file:/..." necessary in the driver java options?
--conf spark.executor.extraJavaOptions='-Dlog4j.configuration=config/log4j.properties'
--driver-java-options -Dlog4j.configuration=config/log4j.properties
--files config/log4j.properties
Created 06-21-2018 12:05 PM
@Robert Cornell I see you are using a path config in executor extraJavaOptions. This wont work. Please copy my example and use paths only when I use paths - reference file name only without path when I also did so.
HTH
Created 06-21-2018 04:28 PM
@Robert Cornell Try this
--conf spark.executor.extraJavaOptions='-Dlog4j.configuration=log4j.properties'
--driver-java-options -Dlog4j.configuration=config/log4j.properties
--files config/log4j.properties
I just removed the directory for the executor.
HTH
*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.
Created 06-21-2018 06:30 AM
@Robert Cornell, Once you pass a file using --files it will be in the containers current working directory. so no need to provide path in "-Dlog4j.configuration=config/log4j.properties" Instead just pass the file name as provided by @Felix Albani in above comment.
Created 06-21-2018 04:24 PM
Hi @Sandeep Nemuri. I'm running client mode, so I believe I've followed his instructions correctly. For the driver and executor in client mode, @Felix Albani suggested the following:
--driver-java-options "-Dlog4j.configuration=file:/path/to/log4j/log4j.properties"
--conf "spark.executor.extraJavaOptions='-Dlog4j.configuration=log4j.properties'"
Aside from the --files instruction. I can confirm in the logs that log4j.properties does get uploaded.