We have a NameNode HA configured in CDH4.5 and use Quorum based HA. Additionally we have configured several storage paths for the edits on local machine (several disks) since that is also often recommended to do, to secure the NameNode data even better against disk failures.
Now my question is, does this really makes sense in this kind of HA setup? The edits are stored (and retrieved) from Quorum Nodes anyway. So my first thought here is that this is a bit too much storage locations to keep in sync.
Plus, now on one of our clusters a real disk error occured. All was fine (disk was detected read-only and got ignored by running Namenode) until we tried to restart the system for other reasons. The Namenode simply refused to come up since one of its storage locations was not working. So somehow the multiple storage locations prevented the startup instead of being of any help in this situation.
I see the point in having multiple storage paths but not the way it is implemented. Frankly spoken I think this is an error, since the Namenode behaves inconsistently:
- he ignores the failing disk while running (which is the preferred way of course)
- but he refuses to boot if one of his storage paths has an error (this is not ok)
There is not an equivalent to "dfs.datanode.failed.volumes.tolerated" for the namenode process. I agree it does seem strange that you are allowed to comma seperate multiple paths, but not start when one of paths isn't available.
As such I've done raid for the namenode filesystems defined by "dfs.name.dir" and "dfs.journalnode.edits.dir" so that I can recover from a drive loss behind the scenes.
As for raid level that's usually more of a function of what your controller can do or what you are comforatble doing with software raid. It's worth mentioning for anyone reading that I only recommend doing this for the NameNodes JournalNodes and Zookeeper filesystems.
Since the non-active namenode is responsible for the checkpointing, it will always have an up to date copy of the FSImage file should the Active Namenode completely disappear, Since you are replicating the edits via the QJM, the edits will be available as well.
Backups of the FSImage are still recommended.
Ok, thanks for the reply.
Probably we will go the Raid way as well then even if i dont like it much. Especially if you want to go for Cloud and VMware, Raid somehow spoils the fun.