Created 10-25-2016 11:15 AM
Hi,
Wondering how to retrieve the job id for a job that is submitted through a crontab scheduled to run at regular intervals. For example, if I run a distcp job in my script as below
hadoop distcp hdfs://nn1:8020/src_path hdfs://nn2:8020/dst_path
How to know the YARN job ID so that I can query the status of the job in my script for completion and then take appropriate action.
PS: For various reasons, we are not using Oozie and hence need to do this in script and schedule using crontab.
Created 10-25-2016 11:54 AM
You would need to use this API to fetch the job status.(https://hadoop.apache.org/docs/r2.4.1/api/org/apache/hadoop/mapreduce/JobStatus.html)
If you want a simple solution you could try something like:
1) Set unique job name (eg:date or time) using -Dmapred.job.name=testdist01
2) Get the app status using :
yarn application -list -appStates ALL,NEW,NEW_SAVING,SUBMITTED,ACCEPTED,RUNNING,FINISHED,FAILED,KILLED | grep -i "distcp: testdist01" | awk '{print $7,$8}' FINISHED SUCCEEDED
Created 10-25-2016 11:54 AM
You would need to use this API to fetch the job status.(https://hadoop.apache.org/docs/r2.4.1/api/org/apache/hadoop/mapreduce/JobStatus.html)
If you want a simple solution you could try something like:
1) Set unique job name (eg:date or time) using -Dmapred.job.name=testdist01
2) Get the app status using :
yarn application -list -appStates ALL,NEW,NEW_SAVING,SUBMITTED,ACCEPTED,RUNNING,FINISHED,FAILED,KILLED | grep -i "distcp: testdist01" | awk '{print $7,$8}' FINISHED SUCCEEDED
Created 10-26-2016 04:19 PM
Another alternative would be to use the YARN REST API to submit the application:
With the New Application API, you can obtain an application-id which can then be used as part of the Cluster Submit Applications API to submit applications.
curl -X POST http://<resource_manager>:8088/ws/v1/cluster/apps/new-application
Reference: Resource Manager REST API Documentation