Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Recommendation of a Multi-Node cluster

avatar
Expert Contributor

Need recommendation for a small 7 Node cluster. Below is what I am planning to do:

MasterNode: NameNode, ResourceManager, HBase Master, Oozie Server, Zookeeper Serve

DataNode: DataNode, NodeManager, RegionServer

Web interface: Ambari server / HUE interface / Zeppelin / Ranger

Gateway Node: All the clients (HDFS, Hive, Spark, Pig, Mahout, Tez etc)

SecondaryNode: Secondary NameNode, HiveServer2, MySQL, WebHCat server, HiveMetaStore

Any issue with this configuration. Also do we need the client on all the machines ?

Should I go with HDP 2.3 or 2.4 ?

Thanks

Prakash

1 ACCEPTED SOLUTION

avatar
Master Guru

"Should I go with HDP 2.3 or 2.4 ?"

I would tend to 2.4 here. Although it depends a bit what you need. An older release of 2.3 may provide some extra stability but a lot of security features ( kerberos for Kafka->Spark Streaming etc. ) and a new spark release are in 2.4 ( and other goodies ). Also upgrading to a point version is normally easier than jumping releases. But again depends on your needs.

"Any issue with this configuration. Also do we need the client on all the machines "

For most cases not ( Sqoop action with hive in oozie needs the hive clients on all nodes but that is an exception )

Regarding your node distribution:

How many datanodes are you planning? I see 5 different node types so I assume you want 3 master nodes one edge node and 3 datanodes?

That may make sense if you plan to grow the cluster later but if you want to get the maximum amount of work done I would rather go for 2 master nodes and perhaps even reuse one of them as edge nodes. And have a decent amount of datanodes. Obviously depends on your server size as well but big modern servers with 12+ cores and 256GB of RAM can host an awful lot of master components at the same time without creating a bottleneck. Others may disagree with me here but I setup a 7 datanode plus 1 master+edge node cluster once ( didn't design it ) and it worked fine as long as you do not expect constant uptime for your cluster ( colocating this many services increases the chance of something going wrong and bringing down the whole cluster because of a server reboot so its nothing you would do for a mission critical system that cannot go down. If you have much smaller servers then you might need more master nodes as well though.

View solution in original post

2 REPLIES 2

avatar
Master Guru

"Should I go with HDP 2.3 or 2.4 ?"

I would tend to 2.4 here. Although it depends a bit what you need. An older release of 2.3 may provide some extra stability but a lot of security features ( kerberos for Kafka->Spark Streaming etc. ) and a new spark release are in 2.4 ( and other goodies ). Also upgrading to a point version is normally easier than jumping releases. But again depends on your needs.

"Any issue with this configuration. Also do we need the client on all the machines "

For most cases not ( Sqoop action with hive in oozie needs the hive clients on all nodes but that is an exception )

Regarding your node distribution:

How many datanodes are you planning? I see 5 different node types so I assume you want 3 master nodes one edge node and 3 datanodes?

That may make sense if you plan to grow the cluster later but if you want to get the maximum amount of work done I would rather go for 2 master nodes and perhaps even reuse one of them as edge nodes. And have a decent amount of datanodes. Obviously depends on your server size as well but big modern servers with 12+ cores and 256GB of RAM can host an awful lot of master components at the same time without creating a bottleneck. Others may disagree with me here but I setup a 7 datanode plus 1 master+edge node cluster once ( didn't design it ) and it worked fine as long as you do not expect constant uptime for your cluster ( colocating this many services increases the chance of something going wrong and bringing down the whole cluster because of a server reboot so its nothing you would do for a mission critical system that cannot go down. If you have much smaller servers then you might need more master nodes as well though.

avatar
Expert Contributor

Thank you so much Benjamin. We are starting with small size cluster with 2 MasterNode (32GB each), 4 DataNode and 1EdgeNode.