Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Datanode failure

avatar
Explorer

Under what circumstances we could notice map-reduce job getting failed/terminated when one of datanode goes down ?

1 ACCEPTED SOLUTION

avatar
Master Mentor

Golden rule for MRv2 an Hadoop cluster should always be an odd number of data nodes 3,5,7,9 etc because of the distributed workload architecture any failed job is automatically restarted on the surviving data nodes. remember to configure the mapred.sites.xml parameter mapreduce.jobtracker.restart.recover parameter to TRUE and dont forget to set the number of tries in the mapreduce.map.maxattempts parameter in the mapred-default.xml

View solution in original post

2 REPLIES 2

avatar

MapReduce job would not fail on a typical HDP cluster unless there is only one Datanode+Node Manager in the cluster. MapReduce tasks would fail on the datanode if datanode goes down but the same failed tasks on failed node (Datanode+NM) would be allocated on other datanodes where other replica of Data is present and MR job would continue.

avatar
Master Mentor

Golden rule for MRv2 an Hadoop cluster should always be an odd number of data nodes 3,5,7,9 etc because of the distributed workload architecture any failed job is automatically restarted on the surviving data nodes. remember to configure the mapred.sites.xml parameter mapreduce.jobtracker.restart.recover parameter to TRUE and dont forget to set the number of tries in the mapreduce.map.maxattempts parameter in the mapred-default.xml