Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Guidelines on Replacing Nodes in Hadoop cluster running on Azure VM's

avatar
Rising Star

We are having a 6 node cluster ( 2 master and 3 slave nodes and 1 Edge Node) on AWS VM's deployed using Ambari. I need to replace the old VM's with VM's of increased size and memory.. What would be the best startegy t o replace the existing 3 slave nodes with a new set of slave nodes with a different instance type. Since we have some data in HDFS. what would be the best strategy to retain the data and bring up the cluster with new nodes.

Will it be advisable to replace Namenode or just Slave Nodes and Edge Node?

1 ACCEPTED SOLUTION

avatar
Master Mentor

you should add new nodes to the cluster and rebalance cluster, then decommission each node one at a time. Since you're replacing three nodes and that's all of your hdfs, it make take some time. Consider larger footprint as you know hdfs maintains replication factor of 3. You're best suited to use a tool like

Cloudbreak or if you're only running ETL, discovery, data science workloads, you can try Hortonworks Data Cloud. Both can add and remove instances as well as provision new instances with new machine type easily.

View solution in original post

14 REPLIES 14

avatar
Master Mentor

you should add new nodes to the cluster and rebalance cluster, then decommission each node one at a time. Since you're replacing three nodes and that's all of your hdfs, it make take some time. Consider larger footprint as you know hdfs maintains replication factor of 3. You're best suited to use a tool like

Cloudbreak or if you're only running ETL, discovery, data science workloads, you can try Hortonworks Data Cloud. Both can add and remove instances as well as provision new instances with new machine type easily.

avatar
Rising Star

@Artem Ervits

Thanks for replying. After adding 2/3 new Slave Nodes to cluster using Ambari, Will the data be automatically copied into new Nodes? By Data I mean to say HDFS, and root/OS partition data. And will the nodes be rebalanced automatically by Ambari or is it to be done manually?

avatar
Master Mentor

@rahul gulati no data will not be copied automatically, all new data will start balancing around the hdfs but not the existing data. You will have to call the balancer yourself via Ambari, https://docs.hortonworks.com/HDPDocuments/Ambari-2.2.1.1/bk_Ambari_Users_Guide/content/_how_to_rebal...

the data residing on OS has to be copied manually by your team.

avatar
Rising Star

@Artem Ervits

Ok. So once i copied the data from HDFS/OS mount points of old nodes to new nodes, then i can start decommissioning old nodes one by one. Please correct me if i am wrong?

Thanks

avatar
Master Mentor

@rahul gulati

you only need to copy OS data, once you add new nodes to your cluster, issue HDFS rebalance command. Once done, go to a node you intend to remove, hit decommision, HDFS will start moving data from that node across the cluster. Continue with next node until all old nodes are removed. I do highly recommend you add more new nodes for DataNode purposes. 3 DN is just not enough, basically you have no resiliency. The more nodes you'll have, the faster this whole project will go for you. Also look at http://sequenceiq.com/cloudbreak-docs/latest/ this will make mgmt of your cluster on cloud a lot easier, including this topic.

avatar
Rising Star

@Artem Ervits

I have already explored the cloudbreak option and got cluster up and running. It really fast. But the problem with cloudbreak is that it does not launch cluster on already running VM's in Azure and it launches new VM's for setting up cluster. We already have VM's running in Azure on which we want to setup cluster.That's why we are planning to go for Ambari installation.

And for 2nd point, In dev environment since our data volume is less, we are keeping 3 data nodes and we would be keeping replication factor as 2. Will that work fine?

Thanks

avatar
Master Mentor

for dev purposes I guess it's fine. True with Cloudbreak but you can grow your cluster and remove nodes from your cluster easily, additionally specifying the type of machine for next time you add a node. Either way is fine but I'm glad you already entertained idea of Cloudbreak.

avatar
Rising Star

@Artem Ervits

Is it possible to add nodes to cluster deployed using Ambari using cloudbreak? What i think is that cloudbreak can only add nodes to cluster spinned using cloudbreak UI.

If yes, is there any reference links on adding/removing nodes using cloudbreak on cluster deployed using ambari?

Thanks

avatar
Master Mentor

@rahul gulati no you can't add nodes to existing cluster deployed by Ambari with Cloudbreak. To add nodes to Cloudbreak provisioned cluster you can use the UI or Cloudbreak CLI for automation purposes.