Support Questions

ahshanmd · ‎12-10-2015

Under what circumstances we could notice map-reduce job getting failed/terminated when one of datanode goes down ?

Shelton · ‎12-10-2015

Golden rule for MRv2 an Hadoop cluster should always be an odd number of data nodes 3,5,7,9 etc because of the distributed workload architecture any failed job is automatically restarted on the surviving data nodes. remember to configure the mapred.sites.xml parameter mapreduce.jobtracker.restart.recover parameter to TRUE and dont forget to set the number of tries in the mapreduce.map.maxattempts parameter in the mapred-default.xml

View solution in original post

pardeep_kumar · ‎12-10-2015

MapReduce job would not fail on a typical HDP cluster unless there is only one Datanode+Node Manager in the cluster. MapReduce tasks would fail on the datanode if datanode goes down but the same failed tasks on failed node (Datanode+NM) would be allocated on other datanodes where other replica of Data is present and MR job would continue.

Shelton · ‎12-10-2015

Golden rule for MRv2 an Hadoop cluster should always be an odd number of data nodes 3,5,7,9 etc because of the distributed workload architecture any failed job is automatically restarted on the surviving data nodes. remember to configure the mapred.sites.xml parameter mapreduce.jobtracker.restart.recover parameter to TRUE and dont forget to set the number of tries in the mapreduce.map.maxattempts parameter in the mapred-default.xml

Cloudera Community

Support Questions

Datanode failure