Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Site high availability for hadoop


Site high availability for hadoop

New Contributor

How to implement site high availability for hadoop ?

What strategy, one cluster extended on multiple sites or multiple clusters with replication or something else ?



Re: Site high availability for hadoop

The latter, definitely the latter. I will take a bit more to get full HA across sites but it'll be more straightforward.

Hadoop write path is the first block on a random or local node, second on a different node in the same rack, and the next on a random separate rack. There is no control to the placement though, so you can't gaurauntee enough blocks are at the other location. You could force it by only having two racks but then locality is blown. There are also the issues with the master services. You would probably encounter issues with writing metadata to both locations. Zk would also never be in a quorom in site two if site one went down.
Don't have an account?
Coming from Hortonworks? Activate your account here