Initially the user submitted Tez job used many containers from the total available 32 but later AM released many leaving just 2 containers to get the job run.Job got completed successfully but would like to know why the containers got released/killed with below message.
2016-10-27 13:06:15,388 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=q1edvl OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS APPID=application_1477338718210_1792 CONTAINERID=container_e118_1477338718210_1792_01_000010
2016-10-27 13:06:15,503 WARN org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=q1edvl IP=188.8.131.52 OPERATION=AM Released Container TARGET=Scheduler RESULT=FAILURE DESCRIPTION=Trying to release container not owned by app or with invalid id. PERMISSIONS=Unauthorized access or invalid container APPID=application_1477338718210_1792 CONTAINERID=container_e118_1477338718210_1792_01_000010
I too have this issue. In my case, preemption is not happening., but I could see above RESULT=FAILURE errors in the log. You know how to fix this ?
Following this takes place
1) When jobs are running in Tez it will allocate maximum available memory and cpu from Datanodes.
2) when other jobs get submitted, it will take take memory and cores from existing memory, if there not much capactity available.
Better to use the Capacity Scheduler in YARN. it will give better performance for the jobs.
Assign the jobs in YARN Queue.