Created 12-10-2015 09:36 PM
Under what circumstances we could notice map-reduce job getting failed/terminated when one of datanode goes down ?
Created 12-10-2015 10:20 PM
Golden rule for MRv2 an Hadoop cluster should always be an odd number of data nodes 3,5,7,9 etc because of the distributed workload architecture any failed job is automatically restarted on the surviving data nodes. remember to configure the mapred.sites.xml parameter mapreduce.jobtracker.restart.recover parameter to TRUE and dont forget to set the number of tries in the mapreduce.map.maxattempts parameter in the mapred-default.xml
Created 12-10-2015 10:03 PM
MapReduce job would not fail on a typical HDP cluster unless there is only one Datanode+Node Manager in the cluster. MapReduce tasks would fail on the datanode if datanode goes down but the same failed tasks on failed node (Datanode+NM) would be allocated on other datanodes where other replica of Data is present and MR job would continue.
Created 12-10-2015 10:20 PM
Golden rule for MRv2 an Hadoop cluster should always be an odd number of data nodes 3,5,7,9 etc because of the distributed workload architecture any failed job is automatically restarted on the surviving data nodes. remember to configure the mapred.sites.xml parameter mapreduce.jobtracker.restart.recover parameter to TRUE and dont forget to set the number of tries in the mapreduce.map.maxattempts parameter in the mapred-default.xml