Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Architecture Question: "Best practices"/ tips to deploy Frameworks in a cluster

Highlighted

Architecture Question: "Best practices"/ tips to deploy Frameworks in a cluster

New Contributor

Hi all

I need to set up a Hortonworks cluster for an IoT Case. Alle devices will send data to the cluster and then we want do analysis with Spark (MLib) and implement a real-time monitoring with Spark queries on a web GUI (querying this data on user input). For data ingestion we'll try out Apache NiFi. The data load is ~2GB per day (without replication, "real" load).

The following other components are also planned to be installed:

  • HDFS (obvisously)
  • YARN
  • Zeppelin
  • Spark
  • Ambari
  • Hive
  • Ambari Infra
  • Ambari Metrics
  • Tez
  • Falcon
  • Knox
  • Ranger
  • Atlas

For the cluster we want start with 3 (more shouldn't be a problem if needed) nodes with the following speccs:

  • 4 Cores 1 TB diskspace and 32 GB RAM per node

We use VMWare, so upscaling shouldn't be a problem but hardware servers are unfortunatly not possible in this case.

Now to my question: Are there "best practices" or tips how to spread and deploy the above named frameworks/services? What is a good choice? Just all frameworks on every node? Should we use additional nodes? Should we use a dedicated node just for one of the above frameworks?

Thanks in advance!

Kind regards

3 REPLIES 3

Re: Architecture Question: "Best practices"/ tips to deploy Frameworks in a cluster

New Contributor

Hi Tim,

I am sharing few links which may help you setting up your cluster. https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.0/bk_cluster-planning-guide/bk_cluster-planni... https://community.hortonworks.com/articles/62667/zookeeper-sizing-and-placement-draft.html

Apart from this I recommend you to go with at least 8/16 core machines.

Regards

Niranjan

Re: Architecture Question: "Best practices"/ tips to deploy Frameworks in a cluster

New Contributor

Hi Niranjan

Thank you for the links, they're very helpful.

What's about the amount of nodes? What do you think would be a good start? Maybe 3 nodes are to few.

Regards,

Tim

Re: Architecture Question: "Best practices"/ tips to deploy Frameworks in a cluster

New Contributor

Hi Tim,

The number of Nodes depends on your use case/POC, Data Volume, Cluster usage, High Availability etc.,

I feel it is good to start with 5 nodes (2 Master and 3 Data Nodes). Hence you will have option to enable HA and you can balance work load with 2 Master nodes and you can replicate the data with replication factor as 3. After this you can add nodes as and when needed.

Regards

Niranjan

Don't have an account?
Coming from Hortonworks? Activate your account here