I have setup a hadoop cluster with 21 nodes(1 namenode&resourcemanager + 20 datanodes). While running jobs, sometimes some data nodes logsoff automatically.but other data nodes and job are running. I would like to know that why this is happening?
Which such number of nodes you should have set up nameode and Resource Manager HA !
Can you also give the HDP version ,Memory and if Kerberized or not ?