Hi,
thinking of having two datacenters and the requirement of having a cluster surviving the failure of a whole datacenter, what would be the preferred setup?
a) ONE Hadoop cluster spanned over both data centers, or
b) TWO independent Hadoop clusters with (somehow) synced data
Questions:
- it seems obvious for option a) that the interconnection between the data centers needs to be veeery good, at least 1GBit ?!?
- is it possible to configure Hadoop to replicate blocks to different data centers, in precedence of replicating to different racks via the rack topology script ?
- if option b) is chosen, how can an automatic,continous data replication between the two clusters be established (are there tools for this) ?
- what are the main considerations, recommendations for the initially mentioned requirement ?
many thanks in advance...Gerd...