question Job ID for a scheduled job in Support Questions

Job ID for a scheduled job

bigdata_superno — Tue, 25 Oct 2016 18:15:06 GMT

Hi,

Wondering how to retrieve the job id for a job that is submitted through a crontab scheduled to run at regular intervals. For example, if I run a distcp job in my script as below

	 hadoop distcp hdfs://nn1:8020/src_path hdfs://nn2:8020/dst_path

How to know the YARN job ID so that I can query the status of the job in my script for completion and then take appropriate action.

PS: For various reasons, we are not using Oozie and hence need to do this in script and schedule using crontab.

Re: Job ID for a scheduled job

sandyy006 — Tue, 25 Oct 2016 18:54:34 GMT

@bigdata.neophyte

You would need to use this API to fetch the job status.(https://hadoop.apache.org/docs/r2.4.1/api/org/apache/hadoop/mapreduce/JobStatus.html)

If you want a simple solution you could try something like:

1) Set unique job name (eg:date or time) using -Dmapred.job.name=testdist01

2) Get the app status using :

yarn application -list -appStates ALL,NEW,NEW_SAVING,SUBMITTED,ACCEPTED,RUNNING,FINISHED,FAILED,KILLED | grep -i "distcp: testdist01" | awk '{print $7,$8}'

FINISHED SUCCEEDED

Re: Job ID for a scheduled job

jlopez — Wed, 26 Oct 2016 23:19:41 GMT

Another alternative would be to use the YARN REST API to submit the application:

With the New Application API, you can obtain an application-id which can then be used as part of the Cluster Submit Applications API to submit applications.

curl -X POST http://<resource_manager>:8088/ws/v1/cluster/apps/new-application

Reference: Resource Manager REST API Documentation