Created 05-20-2016 08:13 AM
Hi everybody,
I have a problem when querying the job info via the REST api. In the past, when I need to see the list of containers that are created for one specific job, I can query it like below and it shows me all the container information.
curl -XGET http://svr02.spo:8188/ws/v1/applicationhistory/apps/application_1445602639127_0365/appattempts/appat... {"container":[ { "containerId":"container_e68_1445602639127_0365_01_000003", "allocatedMB":2048, "allocatedVCores":1, "assignedNodeId":"svr15.spo:45454", "priority":10, "startedTime":1446131587303, "finishedTime":1446131593180, "elapsedTime":5877, "diagnosticsInfo":"Container killed by the ApplicationMaster.\nContainer killed on request. Exit code is 143\nContainer exited with a non-zero exit code 143\n", "logUrl":"http://svr02.spo:8188/applicationhistory/logs/svr15.spo:45454/container_e68_1445602639127_0365_01_000003/container_e68_1445602639127_0365_01_000003/annemarie", "containerExitStatus":-105, "containerState":"COMPLETE", "nodeHttpAddress":"http://svr15.spo:8042" }, { "containerId":"container_e68_1445602639127_0365_01_000002", "allocatedMB":2048, "allocatedVCores":1, "assignedNodeId":"svr12.spo:45454", "priority":20, "startedTime":1446131578209, "finishedTime":1446131586479, "elapsedTime":8270, "diagnosticsInfo":"Container killed by the ApplicationMaster.\nContainer killed on request. Exit code is 143\nContainer exited with a non-zero exit code 143\n", "logUrl":"http://svr02.spo:8188/applicationhistory/logs/svr12.spo:45454/container_e68_1445602639127_0365_01_000002/container_e68_1445602639127_0365_01_000002/annemarie", "containerExitStatus":-105, "containerState":"COMPLETE", "nodeHttpAddress":"http://svr12.spo:8042" }, { "containerId":"container_e68_1445602639127_0365_01_000001", "allocatedMB":1024, "allocatedVCores":1, "assignedNodeId":"svr04.spo:45454", "priority":0, "startedTime":1446131572596, "finishedTime":1446131599631, "elapsedTime":27035, "diagnosticsInfo":"", "logUrl":"http://svr02.spo:8188/applicationhistory/logs/svr04.spo:45454/container_e68_1445602639127_0365_01_000001/container_e68_1445602639127_0365_01_000001/annemarie", "containerExitStatus":0, "containerState":"COMPLETE", "nodeHttpAddress":"http://svr04.spo:8042" }]}
However, later, when I use the same query, the response only displays the information of ONE and ONLY ONE container: the application master. The other container information disappears mythically:
curl -XGET http://svr02.spo:8188/ws/v1/applicationhistory/apps/application_1445602639127_0365/appattempts/appat... {"container":[ { "containerId":"container_e68_1445602639127_0365_01_000001", "allocatedMB":1024, "allocatedVCores":1, "assignedNodeId":"svr04.spo:45454", "priority":0, "startedTime":1446131572596, "finishedTime":1446131599631, "elapsedTime":27035, "diagnosticsInfo":"", "logUrl":"http://svr02.spo:8188/applicationhistory/logs/svr04.spo:45454/container_e68_1445602639127_0365_01_000001/container_e68_1445602639127_0365_01_000001/annemarie", "containerExitStatus":0, "containerState":"COMPLETE", "nodeHttpAddress":"http://svr04.spo:8042" } ] }
Does anyone have the same problem or experience this before? I really need to see the list of container information because I would like to query its starttime and endtime.
Thank you for your help!!
Created 05-23-2016 07:44 PM
From this apache jira - https://issues.apache.org/jira/browse/YARN-3978, we can see that they have configuration option to turn off saving of non-AM container metadata. In order to have non-AM container details, we need to set "yarn.timeline-service.generic-application-history.save-non-am-container-meta-info" and "yarn.timeline-service.generic-application-history.enabled" to TRUE and restart RM and ATS.
Created 05-23-2016 07:44 PM
From this apache jira - https://issues.apache.org/jira/browse/YARN-3978, we can see that they have configuration option to turn off saving of non-AM container metadata. In order to have non-AM container details, we need to set "yarn.timeline-service.generic-application-history.save-non-am-container-meta-info" and "yarn.timeline-service.generic-application-history.enabled" to TRUE and restart RM and ATS.