Created on 09-06-2017 10:47 AM - edited 09-16-2022 05:12 AM
Hello,
I am running CDH 5.12 QuickStart VM with package installation (no parcels, and no CM).
I can't get Spark to produce application logs in the designated HDFS directory, and consequently nothing is displayed by Spark History Server. My Spark jobs run as part of an Oozie workflow, but no Spark logs are produced.
My /etc/spark/conf/spark-defaults.conf contains:
spark.eventLog.enabled true spark.eventLog.dir hdfs:///user/spark/applicationHistory spark.history.fs.logDirectory hdfs:///user/spark/applicationHistory spark.yarn.historyServer.address http://quickstart.cloudera:18088
The HDFS log directory has the following permissions:
$sudo -u hdfs hadoop fs -ls /user/spark Found 1 items drwxrwxrwt - spark spark 0 2017-09-06 13:31 /user/spark/applicationHistory
The Oozie Spark Task runs on Yarn, and it is defined as:
<spark xmlns="uri:oozie:spark-action:0.1"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <master>yarn</master> <mode>cluster</mode> .... </spark>
The Oozie workflow runs correctly, and I can see the logs in the Yarn History Server, and in Hue's Oozie Dashboard. However the Spark History Server shows this:
History Server Event log directory: hdfs:///user/spark/applicationHistory No completed applications found! Did you specify the correct logging directory? Please verify your setting of spark.history.fs.logDirectory and whether you have the permissions to access it. It is also possible that your application did not run to completion or did not stop the SparkContext.
The HDFS directory /user/spark/applicationHistory is empty.
I have looked everywhere in the documentation, specifically here: https://www.cloudera.com/documentation/enterprise/5-11-x/topics/admin_spark_history_server.html, but I have not been able to find a solution, please help.
Thanks in advance,
Alex Soto
Created 09-08-2017 08:59 AM
In case it helps others:
The file /etc/spark/conf/spark-defaults.conf is not used by Oozie Spark Actions by default. In order to tell Oozie Spark Action to use this file, I had to add this to /etc/oozie/conf/oozie-site.xml
<property> <name>oozie.service.SparkConfigurationService.spark.configurations</name> <value>*=/etc/spark/conf/</value> </property>
Now I can see the logs in the Spark History Server. I wonder why this should be the default.
Created 09-08-2017 08:59 AM
In case it helps others:
The file /etc/spark/conf/spark-defaults.conf is not used by Oozie Spark Actions by default. In order to tell Oozie Spark Action to use this file, I had to add this to /etc/oozie/conf/oozie-site.xml
<property> <name>oozie.service.SparkConfigurationService.spark.configurations</name> <value>*=/etc/spark/conf/</value> </property>
Now I can see the logs in the Spark History Server. I wonder why this should be the default.
Created 09-08-2017 11:57 AM
Not sure if this it the correct solution. I am not able to see my Tasks logs, I only see the Spark logs (driver and tasks) but not my application logs. Anything I log from within a closure is not whowing. I tried configuring the log4j.properties file in the /etc/spark/conf/log4j.properties but it doesn't seem to make a difference. The only sucess I had so far was to get the History Server to show something.
Created 07-31-2019 07:08 AM
Hi Alex,
Did u checked in Oozie Configuration or in oozie logs like whether the Event logs are writing in some other path apart from the path that was configured in CM?
Thanks
AKR