Sorry, was not able to come back to the forum for a while. According to your latest screen shot (08-23), the total amount of memory in the cluster is 312 GB, unless you have changed the cluster since 08-03. Therefore, I think what you experienced on Aug 3rd is expected (Resources are not available in the cluster). Even if you did have 16GB left in the cluster, the 16GB could be fragmented all across the nodes, so none of the node could run a 6GB container
I experience exactly the same issue. attaching 3 images to see what's happening. RM logs are nothing unusual...
Using CDH 5.4.5
Increasing Zookeeper jute.maxbuffer has fixed the problem for me. Increased from 12 MB to 48 MB and did rolling restart to fix the issue.