Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Who agreed with this topic

Best practice for data replication/sync between two data centers

avatar
Guru

Hi,

thinking of having two datacenters and the requirement of having a cluster surviving the failure of a whole datacenter, what would be the preferred setup?

 

a) ONE Hadoop cluster spanned over both data centers, or

b) TWO independent Hadoop clusters with (somehow) synced data

 

Questions:

  • it seems obvious for option a) that the interconnection between the data centers needs to be veeery good, at least 1GBit ?!?
  • is it possible to configure Hadoop to replicate blocks to different data centers, in precedence of replicating to different racks via the rack topology script ?
  • if option b) is chosen, how can an automatic,continous data replication between the two clusters be established (are there tools for this) ?
  • what are the main considerations, recommendations for the initially mentioned requirement ?

 

many thanks in advance...Gerd...

Who agreed with this topic