Here are some stats before the question:
From Ambari -> YARN -> Stats:
Number of available containers : 1000+
Number of YARN applications running : 22 ( out of which 18 are Hive-Tez based)
Number of allocated containers : 780
Number of containers pending for allocation : 12000+
My question is : When we have queue burst enabled up to 100% of cluster capacity, Why will Resource Manager not allocate containers to the pending/running job? What could be the reason that we see huge number of pending allocation ?
Is it that Tez calcuates the number of containers required for completion of the job and adds that to the pending container allocation and it gradually gets allocated when required ( like reduce stage's 3000 containers are blocked on map)?
Can any one enlighten please?
I am far from a Yarn "expert" however my understanding of the above is that the jobs run in stages. The job itself has a good understanding of the number of containers that be will required to be launched and complete the entire job. Adding the stage 2 containers to the pending queue allows for pre-warming of containers so as a stage 1 task finishes its resulting data can be past to a pre-warmed container (shaving a small amount of time from execution) for a stage 2 task.
A job with several stages cannot start the stage 2 containers until the data set from stage 1 is complete thats why the containers are still pending.
I hope someone can allobrate or correct me on this answer if not entirley correct.