05-01-2015 07:51 AM
When I launch a Spark job, logs being created into ther HDFS /tmp/logs/<user-id>/logs folder but NOT in /user/history/ folders!
Then, when I launch the JobHistory portal (http://<YARN-JobHistory-Server>:19888/jobhuistory
Is there a daemon that copies the logs from the /tmp/logs/<user-id>/logs fodler to the /user/history/done & /user/history/done_intermediate ones?
Thank you in advance!
Solved! Go to Solution.
05-07-2015 02:10 PM
Thanks for your post. In regards to what you have reported, is the issue that you're seeing specific only to Spark jobs submitted to YARN? If that's the case, it's important to note that the Job History Server in is specific to Map Reduce jobs run on YARN and not actually for Spark. The history of Spark jobs submitted to YARN is handled by a completely separate service called the Spark History Server.
Are you able to run a simple Pi Mapreduce job submitted to YARN, and does that appear in the JHS Web UI once completed?
05-09-2015 09:13 AM
Anthony, thank you for the clarification!
Having said that (JHServer is specific to Map Reduce jobs run on YARN) where other type of jobs will be shown?
(You said tha Spark has its own JHS...)
BTW: Besides M/R and Spark jobs what other types of jobs can we launch via YARN?
How jobs are moved from /tmp/logs/<user-id>/logs fodler to /user/history/done & /user/history/done_intermediate ones?
Or they are created simultaneously?
Thank you for your assistance, it is much appreciated!
05-12-2015 09:59 AM
Responses inline below:
> Having said that the JobHistory Server is specific to Map Reduce jobs run on YARN, where other type of jobs will be shown?
That will depend on what kind of application is being submitted to the YARN framework.
We know that if a MR2 job is submitted, the job details will be available while the job is running within the Resource Manager Web UI (as this is part of the YARN framework); When the job is completed, the job details will be available via the Job History Server.
If a Spark-on-YARN job was is submitted, the job details will still be availabile while the job is running within the Resource Manager Web UI, however when the job completes, the job details will then be available on the Spark History Server, which is a separate role/service that is configured when Spark-on-YARN if setup as a service in Cloudera Manager (or when configuring it in CDH, per our installation guide).
> Besides MR and Spark jobs, what other types of jobs can we launch via YARN?
MR and Spark jobs are what is currently supported, however this may change in the future, as the need arises. YARN is application agnostic and is intentionally designed to allow developers to create applications to run on its distributed framework.
Additonal details regarding YARN applications are available here, from this link.
> Are jobs moved from /tmp/logs/<user-id>/logs folder to /user/history/done & /user/history/done_intermediate ones?
> Are they created simultaneously?
To best clarify the answer, listed below is a brief overview of the order of operations of a MR job in YARN:
1) MR job submitted to RM from client
2) Application folder is created in /tmp/logs/<user-id>/logs/application_xxxxxxxxxxxx_
3) MR job runs in YARN on the cluster
4) MR job completes, counters from job are reported on job client that submitted job
5) Counter information (.jhist file) and job_conf.xml files are written to /user/history/done_intermediate/<user>/job_xxxxxxx
6) .jist file and job_conf.xml are then moved from /user/history/done_intermediate/<user>/ to /user/history/done
7) Container logs from each Node Manager is aggregated into /tmp/logs/<user-id>/logs/application_xxxxxxxxxxxx_xxxx
Hope ths helps!
02-07-2017 10:38 AM
02-07-2017 11:15 AM
Would you be able to provide additional context regarding the failure / permission issue that you're experiencing?
If there's a specific error message or symptom that is occurring could you provide more details as to what is happening?