Created on 09-06-2017 10:47 AM - edited 09-16-2022 05:12 AM
Hello,
I am running CDH 5.12 QuickStart VM with package installation (no parcels, and no CM).
I can't get Spark to produce application logs in the designated HDFS directory, and consequently nothing is displayed by Spark History Server. My Spark jobs run as part of an Oozie workflow, but no Spark logs are produced.
My /etc/spark/conf/spark-defaults.conf contains:
spark.eventLog.enabled true spark.eventLog.dir hdfs:///user/spark/applicationHistory spark.history.fs.logDirectory hdfs:///user/spark/applicationHistory spark.yarn.historyServer.address http://quickstart.cloudera:18088
The HDFS log directory has the following permissions:
$sudo -u hdfs hadoop fs -ls /user/spark Found 1 items drwxrwxrwt - spark spark 0 2017-09-06 13:31 /user/spark/applicationHistory
The Oozie Spark Task runs on Yarn, and it is defined as:
<spark xmlns="uri:oozie:spark-action:0.1"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <master>yarn</master> <mode>cluster</mode> .... </spark>
The Oozie workflow runs correctly, and I can see the logs in the Yarn History Server, and in Hue's Oozie Dashboard. However the Spark History Server shows this:
History Server Event log directory: hdfs:///user/spark/applicationHistory No completed applications found! Did you specify the correct logging directory? Please verify your setting of spark.history.fs.logDirectory and whether you have the permissions to access it. It is also possible that your application did not run to completion or did not stop the SparkContext.
The HDFS directory /user/spark/applicationHistory is empty.
I have looked everywhere in the documentation, specifically here: https://www.cloudera.com/documentation/enterprise/5-11-x/topics/admin_spark_history_server.html, but I have not been able to find a solution, please help.
Thanks in advance,
Alex Soto