Support Questions
Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Innovation Accelerator group hub.

4 node cluster configuration

I'm setting up a 4 node cluster (1x phsical and 3x virtual). The physical machine is to be the master/edge and the 3x VMs are to be the datanodes. My question is when using Ambari 2.5.0.3 "assigning the masters" do I keep everything on the intended master (including the secondary namenode) and only run one zookeeper server/metrics collector/activity analyzer/activity explorer, or do I place the secondary namenode on one of the datanodes along with a zookeeper server/metrics collector/activity analyzer/activity explorer on each of the datanodes?

My intent is to have the physical machine act as the client/edgenode and the VMs to just handle data. Any advice is appreciated.

Thanks in advance.

1 ACCEPTED SOLUTION

Hi @Joshua Petree, what is the purpose of this cluster?

For any cluster that's beyond a Dev sandbox, you need 3 to 5 masters. In order for Zookeeper to function properly, you need at least three ZK instances. It's not recommended to run a Secondary NameNode or any other services, such as ZK, on a DataNode. Also, in order for HDFS to be HA, you need to run a Standby NameNode.

Remember that Hadoop is designed with the assumption that DataNodes will fail. If you start putting critical services on DataNodes, not only will it hurt your performance, it will create points of failure that will affect the overall health of the cluster.

View solution in original post

2 REPLIES 2

Hi @Joshua Petree, what is the purpose of this cluster?

For any cluster that's beyond a Dev sandbox, you need 3 to 5 masters. In order for Zookeeper to function properly, you need at least three ZK instances. It's not recommended to run a Secondary NameNode or any other services, such as ZK, on a DataNode. Also, in order for HDFS to be HA, you need to run a Standby NameNode.

Remember that Hadoop is designed with the assumption that DataNodes will fail. If you start putting critical services on DataNodes, not only will it hurt your performance, it will create points of failure that will affect the overall health of the cluster.

This is what I was expecting, but this is what I am given to work with, sadly. Thank you for your input. I am hoping if this build goes well, then I can convence the "powers at be" for a bigger budget to build a proper cluster. Thank you again.