I have recreated all of our Hadoop cluster from scratch as the NN centos failed,
we had no backup so it was real disaster!
I would like to prevent this from happening again so please tell me what kind of backup do I need?
I have one master- cloudera manager and DB
And 4 workers
Please let me know if you need more details...
You can enable "high availability". This is the normal practice in the industry for fault tolerance.
1. When you have high availability for NN. You have to keep two different NN in two different master. So that even one master is down, the other master will support without any interruption.
2. One NN is active alway and other one is standby
3. Make sure both the masters have the same configuration, it will help NN to be in sync and few more advantages
4. High Availbility (HA) is not only for NN, there are few more other services will also support HA, you can refer the below link to know more