Reply
Contributor
Posts: 39
Registered: ‎02-15-2017

Physical Cluster Moving

[ Edited ]

Hello,

 

I need to relocate a physical cluster from one data center to a new one, here some information about this scenario:

  • NNs: 2
  • DNs: 48
  • Number of racks: 4
  • Rack awareness: yes
  • HA: enabled for HDFS, YARN and HBase
  • HBase: yes
  • Used space: ~500TB (before replication).
  • RF: 3 (dafault).
  • Average memory: 128GB
  • Average CPU cores: 32

Having said that let's get to the point, there's no backup and there's no way to guarantee that the nodes will continue to work well after being turned off and moved.
There are nodes that have not been turned off for more than 500 days, I have no idea how their disks will support the moving.
I'm saying this because the hypothetical loss of several nodes from different racks can cause data loss. Again, no backup.
I need to ensure the cluster is up and running after the move without data loss.

I've been thinking in two different approaches to mitigate this situation, kinda disaster recovery plan:

  1. BackUp all data (wich can take more than 2 months) and here I need to solve how to deal with daily aggregations/incomming data.
  2. Create another cluster and replica data(the cluster could be physical or in the cloud).

I was wondering if maybe someone could help me with this.
Any suggestion is welcome.

Many thanks!

Posts: 1,754
Kudos: 371
Solutions: 279
Registered: ‎07-31-2013

Re: Physical Cluster Moving

Both of those presented plans seem like viable options, since they move the data outside of the movement risk.

Another option could be to increase the availability of the data in the cluster, to outweigh some of the probability of loss. For example, a factor of 4 or 5 replicas may be more resilient than 3, while on the same cluster, assuming space is available. This can be lowered back down after the move.
Announcements