Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Who agreed with this topic

Best practice for data replication/sync between two data centers

avatar
Guru

Hi,

thinking of having two datacenters and the requirement of having a cluster surviving the failure of a whole datacenter, what would be the preferred setup?

 

a) ONE Hadoop cluster spanned over both data centers, or

b) TWO independent Hadoop clusters with (somehow) synced data

 

Questions:

  • it seems obvious for option a) that the interconnection between the data centers needs to be veeery good, at least 1GBit ?!?
  • is it possible to configure Hadoop to replicate blocks to different data centers, in precedence of replicating to different racks via the rack topology script ?
  • if option b) is chosen, how can an automatic,continous data replication between the two clusters be established (are there tools for this) ?
  • what are the main considerations, recommendations for the initially mentioned requirement ?

 

many thanks in advance...Gerd...

Who agreed with this topic