Created 12-13-2018 12:16 PM
Hi
I am trying to find job in wait time ie. when they got accepted and the time they are waiting to get any resources for execution. I know that if we go to yarn application logs, we can get this info. But I was wondering if the same can be answered through any yarn metrics available using yarn rest apis? Any pointers will be helpful.
Created 12-13-2018 01:19 PM
It's possible with Yarn RESTAPI(SubmitApplicationAPI) and in response we are going to have submittedTime and startTime the difference between these two unix timestamps would be the wait time.
Api:
GET http://<history server http address:port>/ws/v1/history/mapreduce/jobs
Refer to this link for more details regards to the Api documentation.
Created 12-13-2018 02:13 PM
I tried the above link and it leads me to mapreduce history server. Based on the documentation, I don't think I would be able to track yarn applications through it. It is looking for jobIDs not application IDs. It looks more like legacy mapreduce endpoint which would work only for mapreduce applications. This is the link I am refering to here. Am I missing something?
Created 12-13-2018 02:40 PM
I even tried yarn application timeserver and try to get something. But Not sure why its not showing the correct details -
curl -s -X GET --negotiate http://sandbox-hdp.hortonworks.com:8188/ws/v1/applicationhistory/apps/application_1544447997568_0003 { "appId": "application_1544447997568_0003", "currentAppAttemptId": "appattempt_1544447997568_0003_-000001", "user": "root", "name": "Spark Pi", "queue": "default", "type": "SPARK", "host": "N/A", "rpcPort": -1, "appState": "RUNNING", "runningContainers": 0, "progress": 0.0, "diagnosticsInfo": "", "originalTrackingUrl": "N/A", "trackingUrl": "N/A", "finalAppStatus": "UNDEFINED", "submittedTime": 1544539343612, "startedTime": 1544539343612, "finishedTime": 0, "elapsedTime": 388755, "priority": 0, "allocatedCpuVcores": 0, "allocatedMemoryMB": 0, "unmanagedApplication": false, "appNodeLabelExpression": "<Not set>", "amNodeLabelExpression": "<DEFAULT_PARTITION>" ....... When I grepped the yarn logs for the same app I can see the job waited for a while before it was in Running state. I can see there is lag of at least 10 seconds before the state changed from Accepted to Running.
Created 12-13-2018 04:11 PM
I think I found a way to figure out if there are any lags in job submission and resource allocations, I am not sure if it gives the right figure but it will put me closer to what I am trying to find. Its a 2 part step -
1. To get application attempt ID from application timeserver
curl -s -X GET --negotiate <timelineserver:port>/ws/v1/applicationhistory/apps/<app_id>
eg. curl -s -X GET --negotiate http://sandbox-hdp.hortonworks.com:8188/ws/v1/applicationhistory/apps/application_1544447997568_0005
{"appId":"application_1544447997568_0005","currentAppAttemptId":"appattempt_1544447997568_0005_000001","user":"root","name":"Spark Pi","queue":"default","type":"SPARK","host":"172.18.0.2","rpcPort":0,"appState":"FINISHED","runningContainers":0,"progress":100.0,"diagnosticsInfo":"","originalTrackingUrl":"sandbox-hdp.hortonworks.com:18081/history/application_1544447997568_0005/1","trackingUrl":"http://sandbox-hdp.hortonworks.com:8088/proxy/application_1544447997568_0005/","finalAppStatus":"SUCCEEDED","submittedTime":1544540416958,"startedTime":1544540416958,"finishedTime":1544541742388,"elapsedTime":1325430,"priority":0,"allocatedCpuVcores":0,"allocatedMemoryMB":0,"unmanagedApplication":false,"appNodeLabelExpression":"<Not set>","amNodeLabelExpression":"<DEFAULT_PARTITION>"}
You will get 2 main things from here -
2. Now I can make 2nd API call to get timelines for containers which were launched with the app attempt ID using the following call
curl -s -X GET --negotiate <timelineserver:port>/ws/v1/applicationhistory/apps/<app_id>/appattempts/<appattemptid>/containers
Taking above example -
curl -s -X GET --negotiate http://sandbox-hdp.hortonworks.com:8188/ws/v1/applicationhistory/apps/application_1544447997568_0005...
{"container":[{"containerId":"container_1544447997568_0005_01_000001","allocatedMB":1000,"allocatedVCores":1,"assignedNodeId":"sandbox-hdp.hortonworks.com:45454","priority":0,"startedTime":1544541158902,"finishedTime":1544541742922,"elapsedTime":584020,"diagnosticsInfo":"","logUrl":"http://sandbox-hdp.hortonworks.com:8188/applicationhistory/logs/sandbox-hdp.hortonworks.com:45454/container_1544447997568_0005_01_000001/container_1544447997568_0005_01_000001/root","containerExitStatus":0,"containerState":"COMPLETE","nodeHttpAddress":"http://sandbox-hdp.hortonworks.com:8042","nodeId":"sandbox-hdp.hortonworks.com:45454"}]}
I can see the container startedTime using this call which is 1544541158902
I can use this to deduce Wait time between when the job started i.e. 1544540416958 and when the job's first container launched i.e. 1544541158902
Difference between these 2 timestamps i.e. 741944 ms i.e. 741 seconds i.e. 12.30 mins roughly
Upon checking yarn application logs of the same application ID, I can deduce the timelines roughly matches -
18/12/11 15:00:19 INFO Client: Application report for application_1544447997568_0005 (state: ACCEPTED) 18/12/11 15:00:20 INFO Client: Application report for application_1544447997568_0005 (state: ACCEPTED) 18/12/11 15:00:21 INFO Client: Application report for application_1544447997568_0005 (state: ACCEPTED) 18/12/11 15:00:22 INFO Client: Application report for application_1544447997568_0005 (state: ACCEPTED) 18/12/11 15:00:23 INFO Client: Application report for application_1544447997568_0005 (state: ACCEPTED) .................................... 18/12/11 15:13:12 INFO YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> sandbox-hdp.hortonworks.com, PROXY_URI_BASES -> http://sandbox-hdp.hortonworks.com:8088/proxy/application_1544447997 568_0005), /proxy/application_1544447997568_0005 18/12/11 15:13:12 INFO JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter 18/12/11 15:13:13 INFO Client: Application report for application_1544447997568_0005 (state: ACCEPTED) 18/12/11 15:13:14 INFO YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(spark-client://YarnAM) 18/12/11 15:13:14 INFO Client: Application report for application_1544447997568_0005 (state: RUNNING)
Hope I am on right path, and it might help someone who is looking for same.