Reply
Explorer
Posts: 13
Registered: ‎08-23-2018

Hardware specifications

CDH 6.0.1

 

I want to build 10 Worker Hosts with High Availability.

I referenced the following guide documents.

 

https://www.cloudera.com/documentation/enterprise/6/latest/topics/cm_ig_host_allocations.html

https://www.cloudera.com/documentation/enterprise/6/release-notes/topics/rg_hardware_requirements.ht...

http://blog.cloudera.com/blog/2013/08/how-to-select-the-right-hardware-for-your-new-hadoop-cluster/

 

 

My cluster services Kafka, HDFS, YARN, Hive on Spark, Hue, Oozie, Spark Streamnig, Sqoop and Impala.

I will deploy the host as shown below.

 

 

Gateway.

Can I place the Gateway on the Worker host?

 

Utility.

I understand about HA in Namenode, but I have not yet learned HA for other services.

When I configure HA for all other services in the cluster, how many additional Hosts do I need?

For example, services such as cloudera manager, Yarn, Hive, Hue, Spark, etc.

 

Hard Requirement.

I collect 90 GB of text log per day.

My cluster collects, ETLs, or aggregates (counts) text logs of 90 GB per day.
The cluster handles about 10 workflows.

If 32G of memory is allocated to the MasterNode and 64G of memory is used for the WorkNode, is the memory adequate?

 

 

I wait for your advice.

 

Master
Posts: 368
Registered: ‎07-01-2015

Re: Hardware specifications

Hi, your questions I think are answerred in the doc links you provided, so for example for setup with HA, Cloudera recommends a setup as you pictured.

Regarding Utility - all these services are non-HA services, so you cant run two Cloudera Managers in HA mode, neither Impala Catalog.

Regarding GW: you can place it where you want, but the question is what is the purpose of the GW? If it is for running Hue, it may suffer from performance degrad because it will compete with worker node processes for resources. Technically there is no issue to put it on the worker node, but I do not recommend that.

Memory: if this is the only thing what you will do on the cluster then 64G is probably ok. Keep in mind that more users (for example via Hue) means more jobs (spark, hive) or Impala queries. And you will have to decide how much you give to YARN and Impala. I would recommend 128G per worker.
Explorer
Posts: 13
Registered: ‎08-23-2018

Re: Hardware specifications

[ Edited ]

First of all thank you for your answer.

I decided to set the WorkerNode's memory to 128G.

In the picture of my question, the Utility is a Host( 1 ea ) that is deployed with ClouderaManager, Zookeeper, and Journalnode.

 

In this question,

What I want to know is how many Hosts I need.

I am ahead of hardware purchases and I have to decide how many Hosts I should buy.

 

https://www.cloudera.com/documentation/enterprise/latest/topics/admin_ha.html

According to this document, Gateway (Hue, Hive server2) and Utility (Oozie, Cloudera Manager Server) can be HA configuration.

I am planning one Utility Host and two Master Host.

I do not want to allocate an independent Host for the Hue, HiveServer2, and Oozie services.

So what I'm considering is putting them in the Master or Worker.

 

 

Master
Posts: 368
Registered: ‎07-01-2015

Re: Hardware specifications

Good, wise choice with 128G. Regarding HS2, just make sure he will get at least 4G of RAM and will be able to handle up to 1000 open operations and 100 open connections. This is my estimate.
Explorer
Posts: 13
Registered: ‎08-23-2018

Re: Hardware specifications

Is there any problem in servicing HS2 and Hue in Master?
Announcements