Support Questions
Find answers, ask questions, and share your expertise

Small Cluster Hardware Requirements

Highlighted

Small Cluster Hardware Requirements

New Contributor

I need to setup a hadoop cluster that would fit the following: please help me for a hardware specs for all clusters needed. Thanks!

 

ingestion: 10GB per day

replication: 3

number of years: 2 years

 

3 REPLIES 3

Re: Small Cluster Hardware Requirements

Community Manager

While you are waiting for someone more experienced to answer the question, allow me to get you started with a Community Knowledge article on the subject.

 

Selecting the Right Hardware for Your New Hadoop Cluster

 

 


Cy Jervis, Community Manager

Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Learn more about the Cloudera Community:
Community Guidelines
How to use the forum
Highlighted

Re: Small Cluster Hardware Requirements

Cloudera Employee

 

Hello kjvillas,

welcome to the  forums,

 

Since I do not know the work  loads you will be placing on this cluster, like what other services are going to run like HBASE, YARN, MapReduce….etc

 

For this cluster I can at least recommend the HDFS capacity needs to 21.9 TB

 

The way I found that out was take your ingestion per day (10 GB) and times that by the replication factor(3).  

10 * 3 = 30

So now we are looking at 30GB of data a day

now we multiply it by the number of days in 2 years (730)

730 * 30 =21900

which gives us 21,900 GB or 21.9TB

 

As far as hosts I am going to recommend that you use 5 hosts

3 data nodes 2 master nodes.

 

the data nodes will need 9 TB of storage for each server and as far as the ram and cpu specs it depends on the  response time from the cluster jobs you need  and most important they need to be JBOD (just a bunch of disks) not a RAID setup



The master nodes need to be more beefy  in the memory requirements as the Namenode stores all the block information in memory

 

 

I hope this gives you some ideas :) 

------------------
Thanks,
Jason
Highlighted

Re: Small Cluster Hardware Requirements

New Contributor

Hi,

 

I would like to setup a cluster using CM 5.11.0. This cluster is going to be just for research and the size of ingested data can be very limited. However, I would like to get advantage of using HBase, HDFS, Impala, YARN, ZooKeeper, Spark.

 

I was wondering the following hardware spec would be enough for just the purpose of testing?

 

One name node: 2-hex cores CPU, 24 GB RAM, 100 GB Storage

Two data nodes: each with 2 cores CPU, 8 GB RAM, 100 GB Storage

 

Could you please let me know if this would works?

 

Thanks