Support Questions

Find answers, ask questions, and share your expertise

How to move cluster?

avatar
Explorer

I have a cluster of about 20 datanodes. Suppose I have a need to shut about half of them off, let's say to move them across the room. I have the impression that the correct action is to stop all services on all nodes, including the Primary NameNode, then shutdown the nodes to move. Question 1) is this correct? and 2) is there risk of losing any data? (Of course I have to ask.) And question 3) is the restart procedure just to boot the nodes, then start all services on all nodes? And 4) as I don't believe the cluster's ever been rebooted, can we test this procedure by stopping and starting all services on one node at a time while leaving others running?

1 ACCEPTED SOLUTION

avatar
Guru

I'll take a stab at addressing these questions:

 

1) Yes, you will need to shut down all hadoop services on all nodes before you perform a move like this, because HDFS will naturally attempt to re-replicate all the data that was residing on the 10 datanodes which you shut down.  And since that would be half your cluster, it's likely that there would be some blocks that could not be re-replicated because the only copies of those blocks resided on the 10 nodes you shut down, so your HDFS would go into safe mode due to under-replicated/missing blocks.  No risk of data loss, just not the way you'd like to do it.

 

2) If you properly shut down all services before doing the move, there is no risk of data loss.  Just be sure your move doesn't entail giving the machines new IP addresses/hostnames, as this is an entirely different operation that requires a careful migration process.  

 

3) yes

 

4) as stated in my response #1, you will get data replication churn in your cluster if you shut down individual datanodes.  Cloudera Manager (enterprise) supports the notion of a rolling restart for your services if you'd like to maximize uptime, but otherwise you'll get the Namenode trying to re-replicate data if you stop one single node.  After a certain timeout is reached, at least.  I think you have several minutes before the blocks will begin to re-replicate to other nodes.

View solution in original post

2 REPLIES 2

avatar
Guru

I'll take a stab at addressing these questions:

 

1) Yes, you will need to shut down all hadoop services on all nodes before you perform a move like this, because HDFS will naturally attempt to re-replicate all the data that was residing on the 10 datanodes which you shut down.  And since that would be half your cluster, it's likely that there would be some blocks that could not be re-replicated because the only copies of those blocks resided on the 10 nodes you shut down, so your HDFS would go into safe mode due to under-replicated/missing blocks.  No risk of data loss, just not the way you'd like to do it.

 

2) If you properly shut down all services before doing the move, there is no risk of data loss.  Just be sure your move doesn't entail giving the machines new IP addresses/hostnames, as this is an entirely different operation that requires a careful migration process.  

 

3) yes

 

4) as stated in my response #1, you will get data replication churn in your cluster if you shut down individual datanodes.  Cloudera Manager (enterprise) supports the notion of a rolling restart for your services if you'd like to maximize uptime, but otherwise you'll get the Namenode trying to re-replicate data if you stop one single node.  After a certain timeout is reached, at least.  I think you have several minutes before the blocks will begin to re-replicate to other nodes.

avatar
Explorer

Thanks for the info.