Reply
Highlighted
New Contributor
Posts: 1
Registered: ‎01-16-2017

Site high availability for hadoop

How to implement site high availability for hadoop ?

What strategy, one cluster extended on multiple sites or multiple clusters with replication or something else ?

 

Posts: 642
Topics: 3
Kudos: 121
Solutions: 67
Registered: ‎08-16-2016

Re: Site high availability for hadoop

The latter, definitely the latter. I will take a bit more to get full HA across sites but it'll be more straightforward.

Hadoop write path is the first block on a random or local node, second on a different node in the same rack, and the next on a random separate rack. There is no control to the placement though, so you can't gaurauntee enough blocks are at the other location. You could force it by only having two racks but then locality is blown. There are also the issues with the master services. You would probably encounter issues with writing metadata to both locations. Zk would also never be in a quorom in site two if site one went down.
Announcements

Our community is getting a little larger. And a lot better.


Learn More about the Cloudera and Hortonworks community merger planned for late July and early August.