Member since
04-29-2016
10
Posts
1
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5382 | 07-18-2016 03:47 PM |
12-13-2018
04:11 PM
I think I found a way to figure out if there are any lags in job submission and resource allocations, I am not sure if it gives the right figure but it will put me closer to what I am trying to find. Its a 2 part step - 1. To get application attempt ID from application timeserver curl -s -X GET --negotiate <timelineserver:port>/ws/v1/applicationhistory/apps/<app_id> eg. curl -s -X GET --negotiate http://sandbox-hdp.hortonworks.com:8188/ws/v1/applicationhistory/apps/application_1544447997568_0005 {"appId":"application_1544447997568_0005","currentAppAttemptId":"appattempt_1544447997568_0005_000001","user":"root","name":"Spark Pi","queue":"default","type":"SPARK","host":"172.18.0.2","rpcPort":0,"appState":"FINISHED","runningContainers":0,"progress":100.0,"diagnosticsInfo":"","originalTrackingUrl":"sandbox-hdp.hortonworks.com:18081/history/application_1544447997568_0005/1","trackingUrl":"http://sandbox-hdp.hortonworks.com:8088/proxy/application_1544447997568_0005/","finalAppStatus":"SUCCEEDED","submittedTime":1544540416958,"startedTime":1544540416958,"finishedTime":1544541742388,"elapsedTime":1325430,"priority":0,"allocatedCpuVcores":0,"allocatedMemoryMB":0,"unmanagedApplication":false,"appNodeLabelExpression":"<Not set>","amNodeLabelExpression":"<DEFAULT_PARTITION>"} You will get 2 main things from here -
AppAttemptID = appattempt_1544447997568_0005_000001 Job started time = 1544540416958 2. Now I can make 2nd API call to get timelines for containers which were launched with the app attempt ID using the following call curl -s -X GET --negotiate <timelineserver:port>/ws/v1/applicationhistory/apps/<app_id>/appattempts/<appattemptid>/containers Taking above example - curl -s -X GET --negotiate http://sandbox-hdp.hortonworks.com:8188/ws/v1/applicationhistory/apps/application_1544447997568_0005/appattempts/appattempt_1544447997568_0005_000001/containers {"container":[{"containerId":"container_1544447997568_0005_01_000001","allocatedMB":1000,"allocatedVCores":1,"assignedNodeId":"sandbox-hdp.hortonworks.com:45454","priority":0,"startedTime":1544541158902,"finishedTime":1544541742922,"elapsedTime":584020,"diagnosticsInfo":"","logUrl":"http://sandbox-hdp.hortonworks.com:8188/applicationhistory/logs/sandbox-hdp.hortonworks.com:45454/container_1544447997568_0005_01_000001/container_1544447997568_0005_01_000001/root","containerExitStatus":0,"containerState":"COMPLETE","nodeHttpAddress":"http://sandbox-hdp.hortonworks.com:8042","nodeId":"sandbox-hdp.hortonworks.com:45454"}]} I can see the container startedTime using this call which is 1544541158902 I can use this to deduce Wait time between when the job started i.e. 1544540416958 and when the job's first container launched i.e. 1544541158902 Difference between these 2 timestamps i.e. 741944 ms i.e. 741 seconds i.e. 12.30 mins roughly Upon checking yarn application logs of the same application ID, I can deduce the timelines roughly matches -
18/12/11 15:00:19 INFO Client: Application report for application_1544447997568_0005 (state: ACCEPTED) 18/12/11 15:00:20 INFO Client: Application report for application_1544447997568_0005 (state: ACCEPTED) 18/12/11 15:00:21 INFO Client: Application report for application_1544447997568_0005 (state: ACCEPTED) 18/12/11 15:00:22 INFO Client: Application report for application_1544447997568_0005 (state: ACCEPTED) 18/12/11 15:00:23 INFO Client: Application report for application_1544447997568_0005 (state: ACCEPTED)
....................................
18/12/11 15:13:12 INFO YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> sandbox-hdp.hortonworks.com, PROXY_URI_BASES -> http://sandbox-hdp.hortonworks.com:8088/proxy/application_1544447997
568_0005), /proxy/application_1544447997568_0005 18/12/11 15:13:12 INFO JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter 18/12/11 15:13:13 INFO Client: Application report for application_1544447997568_0005 (state: ACCEPTED) 18/12/11 15:13:14 INFO YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(spark-client://YarnAM)
18/12/11 15:13:14 INFO Client: Application report for application_1544447997568_0005 (state: RUNNING)
Hope I am on right path, and it might help someone who is looking for same.
... View more
07-18-2016
03:47 PM
Solved the mystery. The issue was the "producer.close()" was missing. Main was exiting before kafka finished writing. Strange that this is not required in older api. Anyways for referenece from javadocs - The producer consists of a pool of buffer space that holds records that haven't yet been transmitted to the server* as well as a background I/O thread that is responsible for turning these records into requests and transmitting them* to the cluster. Failure to close the producer after use will leak these resources.
... View more