Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Recommendation of a Multi-Node cluster

Solved Go to solution

Recommendation of a Multi-Node cluster

Rising Star

Need recommendation for a small 7 Node cluster. Below is what I am planning to do:

MasterNode: NameNode, ResourceManager, HBase Master, Oozie Server, Zookeeper Serve

DataNode: DataNode, NodeManager, RegionServer

Web interface: Ambari server / HUE interface / Zeppelin / Ranger

Gateway Node: All the clients (HDFS, Hive, Spark, Pig, Mahout, Tez etc)

SecondaryNode: Secondary NameNode, HiveServer2, MySQL, WebHCat server, HiveMetaStore

Any issue with this configuration. Also do we need the client on all the machines ?

Should I go with HDP 2.3 or 2.4 ?

Thanks

Prakash

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Recommendation of a Multi-Node cluster

"Should I go with HDP 2.3 or 2.4 ?"

I would tend to 2.4 here. Although it depends a bit what you need. An older release of 2.3 may provide some extra stability but a lot of security features ( kerberos for Kafka->Spark Streaming etc. ) and a new spark release are in 2.4 ( and other goodies ). Also upgrading to a point version is normally easier than jumping releases. But again depends on your needs.

"Any issue with this configuration. Also do we need the client on all the machines "

For most cases not ( Sqoop action with hive in oozie needs the hive clients on all nodes but that is an exception )

Regarding your node distribution:

How many datanodes are you planning? I see 5 different node types so I assume you want 3 master nodes one edge node and 3 datanodes?

That may make sense if you plan to grow the cluster later but if you want to get the maximum amount of work done I would rather go for 2 master nodes and perhaps even reuse one of them as edge nodes. And have a decent amount of datanodes. Obviously depends on your server size as well but big modern servers with 12+ cores and 256GB of RAM can host an awful lot of master components at the same time without creating a bottleneck. Others may disagree with me here but I setup a 7 datanode plus 1 master+edge node cluster once ( didn't design it ) and it worked fine as long as you do not expect constant uptime for your cluster ( colocating this many services increases the chance of something going wrong and bringing down the whole cluster because of a server reboot so its nothing you would do for a mission critical system that cannot go down. If you have much smaller servers then you might need more master nodes as well though.

2 REPLIES 2

Re: Recommendation of a Multi-Node cluster

"Should I go with HDP 2.3 or 2.4 ?"

I would tend to 2.4 here. Although it depends a bit what you need. An older release of 2.3 may provide some extra stability but a lot of security features ( kerberos for Kafka->Spark Streaming etc. ) and a new spark release are in 2.4 ( and other goodies ). Also upgrading to a point version is normally easier than jumping releases. But again depends on your needs.

"Any issue with this configuration. Also do we need the client on all the machines "

For most cases not ( Sqoop action with hive in oozie needs the hive clients on all nodes but that is an exception )

Regarding your node distribution:

How many datanodes are you planning? I see 5 different node types so I assume you want 3 master nodes one edge node and 3 datanodes?

That may make sense if you plan to grow the cluster later but if you want to get the maximum amount of work done I would rather go for 2 master nodes and perhaps even reuse one of them as edge nodes. And have a decent amount of datanodes. Obviously depends on your server size as well but big modern servers with 12+ cores and 256GB of RAM can host an awful lot of master components at the same time without creating a bottleneck. Others may disagree with me here but I setup a 7 datanode plus 1 master+edge node cluster once ( didn't design it ) and it worked fine as long as you do not expect constant uptime for your cluster ( colocating this many services increases the chance of something going wrong and bringing down the whole cluster because of a server reboot so its nothing you would do for a mission critical system that cannot go down. If you have much smaller servers then you might need more master nodes as well though.

Re: Recommendation of a Multi-Node cluster

Rising Star

Thank you so much Benjamin. We are starting with small size cluster with 2 MasterNode (32GB each), 4 DataNode and 1EdgeNode.