I need to setup a hadoop cluster that would fit the following: please help me for a hardware specs for all clusters needed. Thanks!
ingestion: 10GB per day
number of years: 2 years
While you are waiting for someone more experienced to answer the question, allow me to get you started with a Community Knowledge article on the subject.
welcome to the forums,
Since I do not know the work loads you will be placing on this cluster, like what other services are going to run like HBASE, YARN, MapReduce….etc
For this cluster I can at least recommend the HDFS capacity needs to 21.9 TB
The way I found that out was take your ingestion per day (10 GB) and times that by the replication factor(3).
10 * 3 = 30
So now we are looking at 30GB of data a day
now we multiply it by the number of days in 2 years (730)
730 * 30 =21900
which gives us 21,900 GB or 21.9TB
As far as hosts I am going to recommend that you use 5 hosts
3 data nodes 2 master nodes.
the data nodes will need 9 TB of storage for each server and as far as the ram and cpu specs it depends on the response time from the cluster jobs you need and most important they need to be JBOD (just a bunch of disks) not a RAID setup
The master nodes need to be more beefy in the memory requirements as the Namenode stores all the block information in memory
I hope this gives you some ideas :)
I would like to setup a cluster using CM 5.11.0. This cluster is going to be just for research and the size of ingested data can be very limited. However, I would like to get advantage of using HBase, HDFS, Impala, YARN, ZooKeeper, Spark.
I was wondering the following hardware spec would be enough for just the purpose of testing?
One name node: 2-hex cores CPU, 24 GB RAM, 100 GB Storage
Two data nodes: each with 2 cores CPU, 8 GB RAM, 100 GB Storage
Could you please let me know if this would works?