Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

4 node cluster configuration

Solved Go to solution
Highlighted

4 node cluster configuration

New Contributor

I'm setting up a 4 node cluster (1x phsical and 3x virtual). The physical machine is to be the master/edge and the 3x VMs are to be the datanodes. My question is when using Ambari 2.5.0.3 "assigning the masters" do I keep everything on the intended master (including the secondary namenode) and only run one zookeeper server/metrics collector/activity analyzer/activity explorer, or do I place the secondary namenode on one of the datanodes along with a zookeeper server/metrics collector/activity analyzer/activity explorer on each of the datanodes?

My intent is to have the physical machine act as the client/edgenode and the VMs to just handle data. Any advice is appreciated.

Thanks in advance.

1 ACCEPTED SOLUTION

Accepted Solutions

Re: 4 node cluster configuration

Hi @Joshua Petree, what is the purpose of this cluster?

For any cluster that's beyond a Dev sandbox, you need 3 to 5 masters. In order for Zookeeper to function properly, you need at least three ZK instances. It's not recommended to run a Secondary NameNode or any other services, such as ZK, on a DataNode. Also, in order for HDFS to be HA, you need to run a Standby NameNode.

Remember that Hadoop is designed with the assumption that DataNodes will fail. If you start putting critical services on DataNodes, not only will it hurt your performance, it will create points of failure that will affect the overall health of the cluster.

2 REPLIES 2

Re: 4 node cluster configuration

Hi @Joshua Petree, what is the purpose of this cluster?

For any cluster that's beyond a Dev sandbox, you need 3 to 5 masters. In order for Zookeeper to function properly, you need at least three ZK instances. It's not recommended to run a Secondary NameNode or any other services, such as ZK, on a DataNode. Also, in order for HDFS to be HA, you need to run a Standby NameNode.

Remember that Hadoop is designed with the assumption that DataNodes will fail. If you start putting critical services on DataNodes, not only will it hurt your performance, it will create points of failure that will affect the overall health of the cluster.

Re: 4 node cluster configuration

New Contributor

This is what I was expecting, but this is what I am given to work with, sadly. Thank you for your input. I am hoping if this build goes well, then I can convence the "powers at be" for a bigger budget to build a proper cluster. Thank you again.

Don't have an account?
Coming from Hortonworks? Activate your account here