Support Questions
Find answers, ask questions, and share your expertise

Guidelines on Replacing Nodes in Hadoop cluster running on Azure VM's

Explorer

We are having a 6 node cluster ( 2 master and 3 slave nodes and 1 Edge Node) on AWS VM's deployed using Ambari. I need to replace the old VM's with VM's of increased size and memory.. What would be the best startegy t o replace the existing 3 slave nodes with a new set of slave nodes with a different instance type. Since we have some data in HDFS. what would be the best strategy to retain the data and bring up the cluster with new nodes.

Will it be advisable to replace Namenode or just Slave Nodes and Edge Node?

1 ACCEPTED SOLUTION

Mentor

you should add new nodes to the cluster and rebalance cluster, then decommission each node one at a time. Since you're replacing three nodes and that's all of your hdfs, it make take some time. Consider larger footprint as you know hdfs maintains replication factor of 3. You're best suited to use a tool like

Cloudbreak or if you're only running ETL, discovery, data science workloads, you can try Hortonworks Data Cloud. Both can add and remove instances as well as provision new instances with new machine type easily.

View solution in original post

14 REPLIES 14

Mentor

you should add new nodes to the cluster and rebalance cluster, then decommission each node one at a time. Since you're replacing three nodes and that's all of your hdfs, it make take some time. Consider larger footprint as you know hdfs maintains replication factor of 3. You're best suited to use a tool like

Cloudbreak or if you're only running ETL, discovery, data science workloads, you can try Hortonworks Data Cloud. Both can add and remove instances as well as provision new instances with new machine type easily.

Explorer

@Artem Ervits

Thanks for replying. After adding 2/3 new Slave Nodes to cluster using Ambari, Will the data be automatically copied into new Nodes? By Data I mean to say HDFS, and root/OS partition data. And will the nodes be rebalanced automatically by Ambari or is it to be done manually?

Mentor

@rahul gulati no data will not be copied automatically, all new data will start balancing around the hdfs but not the existing data. You will have to call the balancer yourself via Ambari, https://docs.hortonworks.com/HDPDocuments/Ambari-2.2.1.1/bk_Ambari_Users_Guide/content/_how_to_rebal...

the data residing on OS has to be copied manually by your team.

Explorer

@Artem Ervits

Ok. So once i copied the data from HDFS/OS mount points of old nodes to new nodes, then i can start decommissioning old nodes one by one. Please correct me if i am wrong?

Thanks

Mentor

@rahul gulati

you only need to copy OS data, once you add new nodes to your cluster, issue HDFS rebalance command. Once done, go to a node you intend to remove, hit decommision, HDFS will start moving data from that node across the cluster. Continue with next node until all old nodes are removed. I do highly recommend you add more new nodes for DataNode purposes. 3 DN is just not enough, basically you have no resiliency. The more nodes you'll have, the faster this whole project will go for you. Also look at http://sequenceiq.com/cloudbreak-docs/latest/ this will make mgmt of your cluster on cloud a lot easier, including this topic.

Explorer

@Artem Ervits

I have already explored the cloudbreak option and got cluster up and running. It really fast. But the problem with cloudbreak is that it does not launch cluster on already running VM's in Azure and it launches new VM's for setting up cluster. We already have VM's running in Azure on which we want to setup cluster.That's why we are planning to go for Ambari installation.

And for 2nd point, In dev environment since our data volume is less, we are keeping 3 data nodes and we would be keeping replication factor as 2. Will that work fine?

Thanks

Mentor

for dev purposes I guess it's fine. True with Cloudbreak but you can grow your cluster and remove nodes from your cluster easily, additionally specifying the type of machine for next time you add a node. Either way is fine but I'm glad you already entertained idea of Cloudbreak.

Explorer

@Artem Ervits

Is it possible to add nodes to cluster deployed using Ambari using cloudbreak? What i think is that cloudbreak can only add nodes to cluster spinned using cloudbreak UI.

If yes, is there any reference links on adding/removing nodes using cloudbreak on cluster deployed using ambari?

Thanks

Mentor

@rahul gulati no you can't add nodes to existing cluster deployed by Ambari with Cloudbreak. To add nodes to Cloudbreak provisioned cluster you can use the UI or Cloudbreak CLI for automation purposes.

Explorer

@Artem Ervits

Yes thats what my understanding is. Thanks for confirming.

By any chance is there a plan in future for cloudbreak to deploy cluster on already running VM's in cloud?

Mentor
@rahul gulati

not sure, I'm not privy to the Cloudbreak roadmap. Perhaps you'd want to open this as a new HCC question and someone from the Cloudbreak team can respond?

Explorer

@Artem Ervits

Ok sure. Just wanted to know about Namenode as well in case that also needs an upgrade to VM of higher memory and CPU. What should be the ideal scenario in that case?

Thanks

Mentor

Add two new nodes, then in Ambari there's an option to move namenode in the HDFS section.

New Contributor

@Artem Ervits

> Both can add and remove instances as well as provision new instances with new machine type easily.

Could you please point where that option could be located in the UI or CLI of Cloudbreak? Thank you!

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.