My node managers and resource manager (on different nodes) keep running into unexpected exits. I have not overcommitted memory on any of my nodes. The workload is okayish for m4.4x instances which is what we are using. worker nodes only have YARN and HDFS on them.
I don't see anything relevant in logs. randomly in groups of 4 or 5, my node managers exit.