Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Job ID for a scheduled job

avatar
Rising Star

Hi,

Wondering how to retrieve the job id for a job that is submitted through a crontab scheduled to run at regular intervals. For example, if I run a distcp job in my script as below

	 hadoop distcp hdfs://nn1:8020/src_path hdfs://nn2:8020/dst_path 

How to know the YARN job ID so that I can query the status of the job in my script for completion and then take appropriate action.

PS: For various reasons, we are not using Oozie and hence need to do this in script and schedule using crontab.

1 ACCEPTED SOLUTION

avatar

@bigdata.neophyte

You would need to use this API to fetch the job status.(https://hadoop.apache.org/docs/r2.4.1/api/org/apache/hadoop/mapreduce/JobStatus.html)

If you want a simple solution you could try something like:

1) Set unique job name (eg:date or time) using -Dmapred.job.name=testdist01

2) Get the app status using :

yarn application -list -appStates ALL,NEW,NEW_SAVING,SUBMITTED,ACCEPTED,RUNNING,FINISHED,FAILED,KILLED | grep -i "distcp: testdist01" | awk '{print $7,$8}'

FINISHED SUCCEEDED

View solution in original post

2 REPLIES 2

avatar

@bigdata.neophyte

You would need to use this API to fetch the job status.(https://hadoop.apache.org/docs/r2.4.1/api/org/apache/hadoop/mapreduce/JobStatus.html)

If you want a simple solution you could try something like:

1) Set unique job name (eg:date or time) using -Dmapred.job.name=testdist01

2) Get the app status using :

yarn application -list -appStates ALL,NEW,NEW_SAVING,SUBMITTED,ACCEPTED,RUNNING,FINISHED,FAILED,KILLED | grep -i "distcp: testdist01" | awk '{print $7,$8}'

FINISHED SUCCEEDED

avatar
New Member

Another alternative would be to use the YARN REST API to submit the application:

With the New Application API, you can obtain an application-id which can then be used as part of the Cluster Submit Applications API to submit applications.

curl -X POST http://<resource_manager>:8088/ws/v1/cluster/apps/new-application

Reference: Resource Manager REST API Documentation