Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Job ID for a scheduled job

avatar
Rising Star

Hi,

Wondering how to retrieve the job id for a job that is submitted through a crontab scheduled to run at regular intervals. For example, if I run a distcp job in my script as below

	 hadoop distcp hdfs://nn1:8020/src_path hdfs://nn2:8020/dst_path 

How to know the YARN job ID so that I can query the status of the job in my script for completion and then take appropriate action.

PS: For various reasons, we are not using Oozie and hence need to do this in script and schedule using crontab.

1 ACCEPTED SOLUTION

avatar

@bigdata.neophyte

You would need to use this API to fetch the job status.(https://hadoop.apache.org/docs/r2.4.1/api/org/apache/hadoop/mapreduce/JobStatus.html)

If you want a simple solution you could try something like:

1) Set unique job name (eg:date or time) using -Dmapred.job.name=testdist01

2) Get the app status using :

yarn application -list -appStates ALL,NEW,NEW_SAVING,SUBMITTED,ACCEPTED,RUNNING,FINISHED,FAILED,KILLED | grep -i "distcp: testdist01" | awk '{print $7,$8}'

FINISHED SUCCEEDED

View solution in original post

2 REPLIES 2

avatar

@bigdata.neophyte

You would need to use this API to fetch the job status.(https://hadoop.apache.org/docs/r2.4.1/api/org/apache/hadoop/mapreduce/JobStatus.html)

If you want a simple solution you could try something like:

1) Set unique job name (eg:date or time) using -Dmapred.job.name=testdist01

2) Get the app status using :

yarn application -list -appStates ALL,NEW,NEW_SAVING,SUBMITTED,ACCEPTED,RUNNING,FINISHED,FAILED,KILLED | grep -i "distcp: testdist01" | awk '{print $7,$8}'

FINISHED SUCCEEDED

avatar
Explorer

Another alternative would be to use the YARN REST API to submit the application:

With the New Application API, you can obtain an application-id which can then be used as part of the Cluster Submit Applications API to submit applications.

curl -X POST http://<resource_manager>:8088/ws/v1/cluster/apps/new-application

Reference: Resource Manager REST API Documentation