Created on 10-11-2016 09:16 AM - edited 09-16-2022 03:44 AM
I have
1. Hive
2. Pig
3. Zookeeper
4.HDFS
5. Hue
6. Oozie
7. Sqoop
8. Yarn
9. Ranger
Currently, all of these are deployed on the same host. Now, I would like to add more hosts to it.
But I have a few doubts:
In production,
1. a node means a server, right? No VM'S ?
2. How many servers I would need to add to have a healthy cluster
3. Which of the above mentioned services should be co-located?
4. What should be the distribution like?
Pig is relatively used less but sqoop, Hive , Oozie and Hue most of the times and ofcourse Ranger for authorization part.
What should be the distribution like? Which of these services should be moved to new hosts?Which of these should be co located?
Which of these should have entirely dedicated server to them? I am new to it and would appreciate if you could give the specifications to establishing a multi-node cluster .
Created 10-11-2016 09:58 AM
Please find answers inline -
1. a node means a server, right? No VM'S ?
- Node means server. A server can be physical hardware or virtual machine also.
2. How many servers I would need to add to have a healthy cluster
- It depends upon what type of configuration you use for production. Generally a broader question to discuss. For Master services I would recommend to deploy on individual node and slave nodes as per your requirement.
In case of HA you need to revisit placement of the above services.
master1 - Active NN,ZK,JN
master1 - Standby NN, ZK, JN,RM, AM,HS
master1 - Ambari, ZK, HIVE,SQOOP,OOZIE,HUE,Ranger,etc..
Slave Nodes - DN,N,etc..
3. Which of the above mentioned services should be co-located?
- For HDFS make sure JN should run most probably on both namenodes, also if possible you should have dedicated disk for JN and ZK.
4. What should be the distribution like?
- You can go for n-1 distribution [where is n=latest stable release from hdp]
You can migrate services after installation.
Created 10-11-2016 09:58 AM
Please find answers inline -
1. a node means a server, right? No VM'S ?
- Node means server. A server can be physical hardware or virtual machine also.
2. How many servers I would need to add to have a healthy cluster
- It depends upon what type of configuration you use for production. Generally a broader question to discuss. For Master services I would recommend to deploy on individual node and slave nodes as per your requirement.
In case of HA you need to revisit placement of the above services.
master1 - Active NN,ZK,JN
master1 - Standby NN, ZK, JN,RM, AM,HS
master1 - Ambari, ZK, HIVE,SQOOP,OOZIE,HUE,Ranger,etc..
Slave Nodes - DN,N,etc..
3. Which of the above mentioned services should be co-located?
- For HDFS make sure JN should run most probably on both namenodes, also if possible you should have dedicated disk for JN and ZK.
4. What should be the distribution like?
- You can go for n-1 distribution [where is n=latest stable release from hdp]
You can migrate services after installation.