- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Guidelines on Replacing Nodes in Hadoop cluster running on Azure VM's
- Labels:
-
Apache Ambari
-
Apache Hadoop
Created 03-07-2017 10:34 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We are having a 6 node cluster ( 2 master and 3 slave nodes and 1 Edge Node) on AWS VM's deployed using Ambari. I need to replace the old VM's with VM's of increased size and memory.. What would be the best startegy t o replace the existing 3 slave nodes with a new set of slave nodes with a different instance type. Since we have some data in HDFS. what would be the best strategy to retain the data and bring up the cluster with new nodes.
Will it be advisable to replace Namenode or just Slave Nodes and Edge Node?
Created 03-07-2017 12:45 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
you should add new nodes to the cluster and rebalance cluster, then decommission each node one at a time. Since you're replacing three nodes and that's all of your hdfs, it make take some time. Consider larger footprint as you know hdfs maintains replication factor of 3. You're best suited to use a tool like
Cloudbreak or if you're only running ETL, discovery, data science workloads, you can try Hortonworks Data Cloud. Both can add and remove instances as well as provision new instances with new machine type easily.
Created 03-07-2017 12:45 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
you should add new nodes to the cluster and rebalance cluster, then decommission each node one at a time. Since you're replacing three nodes and that's all of your hdfs, it make take some time. Consider larger footprint as you know hdfs maintains replication factor of 3. You're best suited to use a tool like
Cloudbreak or if you're only running ETL, discovery, data science workloads, you can try Hortonworks Data Cloud. Both can add and remove instances as well as provision new instances with new machine type easily.
Created 03-07-2017 03:05 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for replying. After adding 2/3 new Slave Nodes to cluster using Ambari, Will the data be automatically copied into new Nodes? By Data I mean to say HDFS, and root/OS partition data. And will the nodes be rebalanced automatically by Ambari or is it to be done manually?
Created 03-07-2017 03:36 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@rahul gulati no data will not be copied automatically, all new data will start balancing around the hdfs but not the existing data. You will have to call the balancer yourself via Ambari, https://docs.hortonworks.com/HDPDocuments/Ambari-2.2.1.1/bk_Ambari_Users_Guide/content/_how_to_rebal...
the data residing on OS has to be copied manually by your team.
Created 03-07-2017 03:57 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ok. So once i copied the data from HDFS/OS mount points of old nodes to new nodes, then i can start decommissioning old nodes one by one. Please correct me if i am wrong?
Thanks
Created 03-07-2017 04:02 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
you only need to copy OS data, once you add new nodes to your cluster, issue HDFS rebalance command. Once done, go to a node you intend to remove, hit decommision, HDFS will start moving data from that node across the cluster. Continue with next node until all old nodes are removed. I do highly recommend you add more new nodes for DataNode purposes. 3 DN is just not enough, basically you have no resiliency. The more nodes you'll have, the faster this whole project will go for you. Also look at http://sequenceiq.com/cloudbreak-docs/latest/ this will make mgmt of your cluster on cloud a lot easier, including this topic.
Created 03-07-2017 04:10 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have already explored the cloudbreak option and got cluster up and running. It really fast. But the problem with cloudbreak is that it does not launch cluster on already running VM's in Azure and it launches new VM's for setting up cluster. We already have VM's running in Azure on which we want to setup cluster.That's why we are planning to go for Ambari installation.
And for 2nd point, In dev environment since our data volume is less, we are keeping 3 data nodes and we would be keeping replication factor as 2. Will that work fine?
Thanks
Created 03-07-2017 04:16 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
for dev purposes I guess it's fine. True with Cloudbreak but you can grow your cluster and remove nodes from your cluster easily, additionally specifying the type of machine for next time you add a node. Either way is fine but I'm glad you already entertained idea of Cloudbreak.
Created 03-07-2017 04:23 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Is it possible to add nodes to cluster deployed using Ambari using cloudbreak? What i think is that cloudbreak can only add nodes to cluster spinned using cloudbreak UI.
If yes, is there any reference links on adding/removing nodes using cloudbreak on cluster deployed using ambari?
Thanks
Created 03-07-2017 04:31 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@rahul gulati no you can't add nodes to existing cluster deployed by Ambari with Cloudbreak. To add nodes to Cloudbreak provisioned cluster you can use the UI or Cloudbreak CLI for automation purposes.