Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Multi Node Cluster Recommendation

Highlighted

Multi Node Cluster Recommendation

Explorer

Need Recommendation on setting up HDP 2.5 Multi Node Cluster using Ambari.(Azure VM's)

Nodes -> 1 Edge Node, 2 NameNodes (Primary and Secondary), 3 DataNodes

Services i am planning to install are as follows

Master Node:NameNode, ResourceManager, HBase Master, Oozie Server, Zookeeper Server, PostgreSQL(For Ambari, Ranger DB), Ranger, Apache Atlas, Apache Spark, History Server, Spark Job History Server

Secondary Node: Secondary NameNode, HiveServer2, MySQL, WebHCat server, HiveMetaStore(PostgreSQL DB)?

Data Node: DataNode, NodeManager, RegionServer

Gateway Node: All the clients (HDFS, Hive, Spark, Pig, Sqoop, Tez, Yarn, HBase etc)

Later on i would be adding SAP HANA Vora services on top of these. So I wanted to decide on RAM, CPU and Hard Disks such that i donot run out of space and memory issues.

What should be the good configuration in above case? And how should i distribute install above services across cluster nodes? Shall i install clients on all nodes?

Please recommend.

Thanks

Rahul

3 REPLIES 3
Highlighted

Re: Multi Node Cluster Recommendation

Explorer

Re: Multi Node Cluster Recommendation

Expert Contributor

Hi @rahul gulati,

It is hard to tell what is the optimal sizing for you, and I am afraid I cannot give you such numbers, since it depends on how much data would you like to store, what jobs you are planning to run, whether they are memory / cpu / disk intensive etc. If you don't have a massive amount of data then, maybe it is also a good approach (since you are in the cloud) that you start with some instances and if they look too small then just launch a new cluster with larger master nodes and copy all of your data over.

Hortonwoks has a HDCloud product, which is based on Cloudbreak, and there is a sizing guidelines for that product, which could be a good starting point for you: http://hortonworks.github.io/hdp-aws/create/index.html

Attila

Highlighted

Re: Multi Node Cluster Recommendation

Explorer

@Attila Kanto

Thanks for replying. We would be provisioning cluster using Ambari instead of cloudbreak. And we would not be handling too large data volume. It would be around 1-2 TB. I noticed you mentioned that in cloud we can start new instances with increased configuration(RAM and Storage). That sounds great. I just wanted to ask that what approach would be preferable in that case to back up the data from already running cluster to new cluster?

There would be OS disks, permanent disks, files to be backed up. Are there any guidelines/reference links on backing up the data from already running 6 VM's to new VM's.? And what if IP/hostname of new VM's gets changed? I think in that case we need to install Ambari Server and Ambari agents again. Please correct me if in am wrong?

Thanks

Don't have an account?
Coming from Hortonworks? Activate your account here