I want to build 10 Worker Hosts with High Availability.
I referenced the following guide documents.
My cluster services Kafka, HDFS, YARN, Hive on Spark, Hue, Oozie, Spark Streamnig, Sqoop and Impala.
I will deploy the host as shown below.
Can I place the Gateway on the Worker host?
I understand about HA in Namenode, but I have not yet learned HA for other services.
When I configure HA for all other services in the cluster, how many additional Hosts do I need?
For example, services such as cloudera manager, Yarn, Hive, Hue, Spark, etc.
I collect 90 GB of text log per day.
My cluster collects, ETLs, or aggregates (counts) text logs of 90 GB per day.
The cluster handles about 10 workflows.
If 32G of memory is allocated to the MasterNode and 64G of memory is used for the WorkNode, is the memory adequate?
I wait for your advice.
First of all thank you for your answer.
I decided to set the WorkerNode's memory to 128G.
In the picture of my question, the Utility is a Host( 1 ea ) that is deployed with ClouderaManager, Zookeeper, and Journalnode.
In this question,
What I want to know is how many Hosts I need.
I am ahead of hardware purchases and I have to decide how many Hosts I should buy.
According to this document, Gateway (Hue, Hive server2) and Utility (Oozie, Cloudera Manager Server) can be HA configuration.
I am planning one Utility Host and two Master Host.
I do not want to allocate an independent Host for the Hue, HiveServer2, and Oozie services.
So what I'm considering is putting them in the Master or Worker.