- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Minimum number of nodes to add in a multi-node cluster
Created on ‎10-11-2016 09:16 AM - edited ‎09-16-2022 03:44 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have
1. Hive
2. Pig
3. Zookeeper
4.HDFS
5. Hue
6. Oozie
7. Sqoop
8. Yarn
9. Ranger
Currently, all of these are deployed on the same host. Now, I would like to add more hosts to it.
But I have a few doubts:
In production,
1. a node means a server, right? No VM'S ?
2. How many servers I would need to add to have a healthy cluster
3. Which of the above mentioned services should be co-located?
4. What should be the distribution like?
Pig is relatively used less but sqoop, Hive , Oozie and Hue most of the times and ofcourse Ranger for authorization part.
What should be the distribution like? Which of these services should be moved to new hosts?Which of these should be co located?
Which of these should have entirely dedicated server to them? I am new to it and would appreciate if you could give the specifications to establishing a multi-node cluster .
Created ‎10-11-2016 09:58 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Please find answers inline -
1. a node means a server, right? No VM'S ?
- Node means server. A server can be physical hardware or virtual machine also.
2. How many servers I would need to add to have a healthy cluster
- It depends upon what type of configuration you use for production. Generally a broader question to discuss. For Master services I would recommend to deploy on individual node and slave nodes as per your requirement.
In case of HA you need to revisit placement of the above services.
master1 - Active NN,ZK,JN
master1 - Standby NN, ZK, JN,RM, AM,HS
master1 - Ambari, ZK, HIVE,SQOOP,OOZIE,HUE,Ranger,etc..
Slave Nodes - DN,N,etc..
3. Which of the above mentioned services should be co-located?
- For HDFS make sure JN should run most probably on both namenodes, also if possible you should have dedicated disk for JN and ZK.
4. What should be the distribution like?
- You can go for n-1 distribution [where is n=latest stable release from hdp]
You can migrate services after installation.
Created ‎10-11-2016 09:58 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Please find answers inline -
1. a node means a server, right? No VM'S ?
- Node means server. A server can be physical hardware or virtual machine also.
2. How many servers I would need to add to have a healthy cluster
- It depends upon what type of configuration you use for production. Generally a broader question to discuss. For Master services I would recommend to deploy on individual node and slave nodes as per your requirement.
In case of HA you need to revisit placement of the above services.
master1 - Active NN,ZK,JN
master1 - Standby NN, ZK, JN,RM, AM,HS
master1 - Ambari, ZK, HIVE,SQOOP,OOZIE,HUE,Ranger,etc..
Slave Nodes - DN,N,etc..
3. Which of the above mentioned services should be co-located?
- For HDFS make sure JN should run most probably on both namenodes, also if possible you should have dedicated disk for JN and ZK.
4. What should be the distribution like?
- You can go for n-1 distribution [where is n=latest stable release from hdp]
You can migrate services after installation.
