Member since
08-31-2017
6
Posts
2
Kudos Received
0
Solutions
09-06-2017
11:16 AM
2 Kudos
Disaster Recovery in Hadoop cluster
refers to the event of recovering all or most of your important data
stored on a Hadoop Cluster in case of disasters like hardware
failures,data loss ,applications error. There should be minimal or no
downtime in cluster. Disaster can be handled through various techniques : 1) Data loss must be preveneted by writing metadata stored on namenode to a different NFS mount. However High Availability introduced in the latest version of Hadoop is a disaster management technique. 2) HDFS snapshots can also be used in case of recovery. 3) You can enable Trash feature in case of accidental deletion because file deleted first goes to trash folder in HDFS. 4) Hadoop distcp tool can also be used for cluster data copying building a mirror cluster in case of any hardware failure
... View more