Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

4 node cluster configuration

avatar
Contributor

I'm setting up a 4 node cluster (1x phsical and 3x virtual). The physical machine is to be the master/edge and the 3x VMs are to be the datanodes. My question is when using Ambari 2.5.0.3 "assigning the masters" do I keep everything on the intended master (including the secondary namenode) and only run one zookeeper server/metrics collector/activity analyzer/activity explorer, or do I place the secondary namenode on one of the datanodes along with a zookeeper server/metrics collector/activity analyzer/activity explorer on each of the datanodes?

My intent is to have the physical machine act as the client/edgenode and the VMs to just handle data. Any advice is appreciated.

Thanks in advance.

1 ACCEPTED SOLUTION

avatar

Hi @Joshua Petree, what is the purpose of this cluster?

For any cluster that's beyond a Dev sandbox, you need 3 to 5 masters. In order for Zookeeper to function properly, you need at least three ZK instances. It's not recommended to run a Secondary NameNode or any other services, such as ZK, on a DataNode. Also, in order for HDFS to be HA, you need to run a Standby NameNode.

Remember that Hadoop is designed with the assumption that DataNodes will fail. If you start putting critical services on DataNodes, not only will it hurt your performance, it will create points of failure that will affect the overall health of the cluster.

View solution in original post

2 REPLIES 2

avatar

Hi @Joshua Petree, what is the purpose of this cluster?

For any cluster that's beyond a Dev sandbox, you need 3 to 5 masters. In order for Zookeeper to function properly, you need at least three ZK instances. It's not recommended to run a Secondary NameNode or any other services, such as ZK, on a DataNode. Also, in order for HDFS to be HA, you need to run a Standby NameNode.

Remember that Hadoop is designed with the assumption that DataNodes will fail. If you start putting critical services on DataNodes, not only will it hurt your performance, it will create points of failure that will affect the overall health of the cluster.

avatar
Contributor

This is what I was expecting, but this is what I am given to work with, sadly. Thank you for your input. I am hoping if this build goes well, then I can convence the "powers at be" for a bigger budget to build a proper cluster. Thank you again.