Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Minimum number of nodes to add in a multi-node cluster

avatar
Expert Contributor

I have

1. Hive

2. Pig

3. Zookeeper

4.HDFS

5. Hue

6. Oozie

7. Sqoop

8. Yarn

9. Ranger

Currently, all of these are deployed on the same host. Now, I would like to add more hosts to it.

But I have a few doubts:

In production,

1. a node means a server, right? No VM'S ?

2. How many servers I would need to add to have a healthy cluster

3. Which of the above mentioned services should be co-located?

4. What should be the distribution like?

Pig is relatively used less but sqoop, Hive , Oozie and Hue most of the times and ofcourse Ranger for authorization part.

What should be the distribution like? Which of these services should be moved to new hosts?Which of these should be co located?

Which of these should have entirely dedicated server to them? I am new to it and would appreciate if you could give the specifications to establishing a multi-node cluster .

1 ACCEPTED SOLUTION

avatar
Super Guru
@Simran Kaur

Please find answers inline -

1. a node means a server, right? No VM'S ?

- Node means server. A server can be physical hardware or virtual machine also.

2. How many servers I would need to add to have a healthy cluster

- It depends upon what type of configuration you use for production. Generally a broader question to discuss. For Master services I would recommend to deploy on individual node and slave nodes as per your requirement.

In case of HA you need to revisit placement of the above services.

master1 - Active NN,ZK,JN

master1 - Standby NN, ZK, JN,RM, AM,HS

master1 - Ambari, ZK, HIVE,SQOOP,OOZIE,HUE,Ranger,etc..

Slave Nodes - DN,N,etc..

3. Which of the above mentioned services should be co-located?

- For HDFS make sure JN should run most probably on both namenodes, also if possible you should have dedicated disk for JN and ZK.

4. What should be the distribution like?

- You can go for n-1 distribution [where is n=latest stable release from hdp]

You can migrate services after installation.

View solution in original post

1 REPLY 1

avatar
Super Guru
@Simran Kaur

Please find answers inline -

1. a node means a server, right? No VM'S ?

- Node means server. A server can be physical hardware or virtual machine also.

2. How many servers I would need to add to have a healthy cluster

- It depends upon what type of configuration you use for production. Generally a broader question to discuss. For Master services I would recommend to deploy on individual node and slave nodes as per your requirement.

In case of HA you need to revisit placement of the above services.

master1 - Active NN,ZK,JN

master1 - Standby NN, ZK, JN,RM, AM,HS

master1 - Ambari, ZK, HIVE,SQOOP,OOZIE,HUE,Ranger,etc..

Slave Nodes - DN,N,etc..

3. Which of the above mentioned services should be co-located?

- For HDFS make sure JN should run most probably on both namenodes, also if possible you should have dedicated disk for JN and ZK.

4. What should be the distribution like?

- You can go for n-1 distribution [where is n=latest stable release from hdp]

You can migrate services after installation.