Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Architecture Question: "Best practices"/ tips to deploy Frameworks in a cluster

Highlighted

Architecture Question: "Best practices"/ tips to deploy Frameworks in a cluster

New Contributor

Hi all

I need to set up a Hortonworks cluster for an IoT Case. Alle devices will send data to the cluster and then we want do analysis with Spark (MLib) and implement a real-time monitoring with Spark queries on a web GUI (querying this data on user input). For data ingestion we'll try out Apache NiFi. The data load is ~2GB per day (without replication, "real" load).

The following other components are also planned to be installed:

  • HDFS (obvisously)
  • YARN
  • Zeppelin
  • Spark
  • Ambari
  • Hive
  • Ambari Infra
  • Ambari Metrics
  • Tez
  • Falcon
  • Knox
  • Ranger
  • Atlas

For the cluster we want start with 3 (more shouldn't be a problem if needed) nodes with the following speccs:

  • 4 Cores 1 TB diskspace and 32 GB RAM per node

We use VMWare, so upscaling shouldn't be a problem but hardware servers are unfortunatly not possible in this case.

Now to my question: Are there "best practices" or tips how to spread and deploy the above named frameworks/services? What is a good choice? Just all frameworks on every node? Should we use additional nodes? Should we use a dedicated node just for one of the above frameworks?

Thanks in advance!

Kind regards

3 REPLIES 3

Re: Architecture Question: "Best practices"/ tips to deploy Frameworks in a cluster

New Contributor

Hi Tim,

I am sharing few links which may help you setting up your cluster. https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.0/bk_cluster-planning-guide/bk_cluster-planni... https://community.hortonworks.com/articles/62667/zookeeper-sizing-and-placement-draft.html

Apart from this I recommend you to go with at least 8/16 core machines.

Regards

Niranjan

Re: Architecture Question: "Best practices"/ tips to deploy Frameworks in a cluster

New Contributor

Hi Niranjan

Thank you for the links, they're very helpful.

What's about the amount of nodes? What do you think would be a good start? Maybe 3 nodes are to few.

Regards,

Tim

Re: Architecture Question: "Best practices"/ tips to deploy Frameworks in a cluster

New Contributor

Hi Tim,

The number of Nodes depends on your use case/POC, Data Volume, Cluster usage, High Availability etc.,

I feel it is good to start with 5 nodes (2 Master and 3 Data Nodes). Hence you will have option to enable HA and you can balance work load with 2 Master nodes and you can replicate the data with replication factor as 3. After this you can add nodes as and when needed.

Regards

Niranjan