Support Questions

Find answers, ask questions, and share your expertise

Yarn Identity job in wait time?

avatar
Contributor

Hi

I am trying to find job in wait time ie. when they got accepted and the time they are waiting to get any resources for execution. I know that if we go to yarn application logs, we can get this info. But I was wondering if the same can be answered through any yarn metrics available using yarn rest apis? Any pointers will be helpful.

4 REPLIES 4

avatar
Master Guru
@Praveenesh Kumar

It's possible with Yarn RESTAPI(SubmitApplicationAPI) and in response we are going to have submittedTime and startTime the difference between these two unix timestamps would be the wait time.

Api:

 GET http://<history server http address:port>/ws/v1/history/mapreduce/jobs

Refer to this link for more details regards to the Api documentation.

avatar
Contributor

I tried the above link and it leads me to mapreduce history server. Based on the documentation, I don't think I would be able to track yarn applications through it. It is looking for jobIDs not application IDs. It looks more like legacy mapreduce endpoint which would work only for mapreduce applications. This is the link I am refering to here. Am I missing something?

avatar
Contributor

I even tried yarn application timeserver and try to get something. But Not sure why its not showing the correct details -

curl -s -X GET --negotiate http://sandbox-hdp.hortonworks.com:8188/ws/v1/applicationhistory/apps/application_1544447997568_0003

{
	"appId": "application_1544447997568_0003",
	"currentAppAttemptId": "appattempt_1544447997568_0003_-000001",
	"user": "root",
	"name": "Spark Pi",
	"queue": "default",
	"type": "SPARK",
	"host": "N/A",
	"rpcPort": -1,
	"appState": "RUNNING",
	"runningContainers": 0,
	"progress": 0.0,
	"diagnosticsInfo": "",
	"originalTrackingUrl": "N/A",
	"trackingUrl": "N/A",
	"finalAppStatus": "UNDEFINED",
	"submittedTime": 1544539343612,
	"startedTime": 1544539343612,
	"finishedTime": 0,
	"elapsedTime": 388755,
	"priority": 0,
	"allocatedCpuVcores": 0,
	"allocatedMemoryMB": 0,
	"unmanagedApplication": false,
	"appNodeLabelExpression": "<Not set>",
	"amNodeLabelExpression": "<DEFAULT_PARTITION>"
.......
When I grepped the yarn logs for the same app I can see the job waited for a while before it was in Running state. I can see there is lag of at least 10 seconds before the state changed from Accepted to Running. 



avatar
Contributor

I think I found a way to figure out if there are any lags in job submission and resource allocations, I am not sure if it gives the right figure but it will put me closer to what I am trying to find. Its a 2 part step -

1. To get application attempt ID from application timeserver

 curl -s -X GET --negotiate <timelineserver:port>/ws/v1/applicationhistory/apps/<app_id>

eg. curl -s -X GET --negotiate http://sandbox-hdp.hortonworks.com:8188/ws/v1/applicationhistory/apps/application_1544447997568_0005

{"appId":"application_1544447997568_0005","currentAppAttemptId":"appattempt_1544447997568_0005_000001","user":"root","name":"Spark Pi","queue":"default","type":"SPARK","host":"172.18.0.2","rpcPort":0,"appState":"FINISHED","runningContainers":0,"progress":100.0,"diagnosticsInfo":"","originalTrackingUrl":"sandbox-hdp.hortonworks.com:18081/history/application_1544447997568_0005/1","trackingUrl":"http://sandbox-hdp.hortonworks.com:8088/proxy/application_1544447997568_0005/","finalAppStatus":"SUCCEEDED","submittedTime":1544540416958,"startedTime":1544540416958,"finishedTime":1544541742388,"elapsedTime":1325430,"priority":0,"allocatedCpuVcores":0,"allocatedMemoryMB":0,"unmanagedApplication":false,"appNodeLabelExpression":"<Not set>","amNodeLabelExpression":"<DEFAULT_PARTITION>"}

You will get 2 main things from here -

  • AppAttemptID = appattempt_1544447997568_0005_000001
  • Job started time = 1544540416958

2. Now I can make 2nd API call to get timelines for containers which were launched with the app attempt ID using the following call

curl -s -X GET --negotiate <timelineserver:port>/ws/v1/applicationhistory/apps/<app_id>/appattempts/<appattemptid>/containers

Taking above example -

curl -s -X GET --negotiate http://sandbox-hdp.hortonworks.com:8188/ws/v1/applicationhistory/apps/application_1544447997568_0005...

{"container":[{"containerId":"container_1544447997568_0005_01_000001","allocatedMB":1000,"allocatedVCores":1,"assignedNodeId":"sandbox-hdp.hortonworks.com:45454","priority":0,"startedTime":1544541158902,"finishedTime":1544541742922,"elapsedTime":584020,"diagnosticsInfo":"","logUrl":"http://sandbox-hdp.hortonworks.com:8188/applicationhistory/logs/sandbox-hdp.hortonworks.com:45454/container_1544447997568_0005_01_000001/container_1544447997568_0005_01_000001/root","containerExitStatus":0,"containerState":"COMPLETE","nodeHttpAddress":"http://sandbox-hdp.hortonworks.com:8042","nodeId":"sandbox-hdp.hortonworks.com:45454"}]}

I can see the container startedTime using this call which is 1544541158902

I can use this to deduce Wait time between when the job started i.e. 1544540416958 and when the job's first container launched i.e. 1544541158902

Difference between these 2 timestamps i.e. 741944 ms i.e. 741 seconds i.e. 12.30 mins roughly

Upon checking yarn application logs of the same application ID, I can deduce the timelines roughly matches -

18/12/11 15:00:19 INFO Client: Application report for application_1544447997568_0005 (state: ACCEPTED)                                    18/12/11 15:00:20 INFO Client: Application report for application_1544447997568_0005 (state: ACCEPTED)                                    18/12/11 15:00:21 INFO Client: Application report for application_1544447997568_0005 (state: ACCEPTED)                                    18/12/11 15:00:22 INFO Client: Application report for application_1544447997568_0005 (state: ACCEPTED)                                    18/12/11 15:00:23 INFO Client: Application report for application_1544447997568_0005 (state: ACCEPTED)   
....................................
18/12/11 15:13:12 INFO YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> sandbox-hdp.hortonworks.com, PROXY_URI_BASES -> http://sandbox-hdp.hortonworks.com:8088/proxy/application_1544447997
568_0005), /proxy/application_1544447997568_0005                                                                                          18/12/11 15:13:12 INFO JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter                              18/12/11 15:13:13 INFO Client: Application report for application_1544447997568_0005 (state: ACCEPTED)                                    18/12/11 15:13:14 INFO YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(spark-client://YarnAM)                                                                                                                           
18/12/11 15:13:14 INFO Client: Application report for application_1544447997568_0005 (state: RUNNING)

Hope I am on right path, and it might help someone who is looking for same.