Support Questions

Find answers, ask questions, and share your expertise

Minimum number of nodes, and specs for a real cluster

avatar
New Contributor

Hi

 

I've been tasked with setting up a Hadoop cluster for testing a new big data initiative. However I'm pretty much completely new to all of this. I know that one can set up a single node cluster for proof of concept, but I would like to know what is the minimum number of nodes, and what spec (amount of RAM & disk space) for a proper cluster. Imagine a low throughput as it's only an initial test cluster (fewer than 10 users). And we only need Kafka, HDFS, Pig & Hive services to run.

 

We generally have the ability to spin up Centos 6 VM's with 4GB RAM each, and I might be able to up that to 8GB each. But Reading many of the setup pages, it's quoting minimums of 10s of GB of RAM (e.g. http://blog.cloudera.com/blog/2013/08/how-to-select-the-right-hardware-for-your-new-hadoop-cluster/)... but the cloudera manager setup only asks for at least 4GB on that node (http://www.cloudera.com/content/www/en-us/documentation/enterprise/5-3-x/topics/cm_ig_cm_requirement... and mentions nothing around the other node's specs.

 

Let me know if you need any more information. I realise it's probably too vague as is.

 

Cheers,

Ed

5 REPLIES 5

avatar
New Contributor
I'm in a similar situation so I'm too interested in any feedback about Ed's question.

avatar
Contributor

Our initial test server for Hadoop cluster is:

 

1 Namenode (64GB ram + 24 core) + 2 hdd 1 for os, 1 for hdfs storage.

3 Datanode (each 32GB ram + 16 core) + 2 hdd 1 for os, 1 for dfs storage.

   - the datanode is also used for: zookeeper, kafka, spark, YARN/mapreduce, Impala and Pig/Hive gateway.

 

As the best practice to run hadoop environment, all server should be a bare metal and not VM.

 

IMHO, maybe you could make the namenode server smaller like 32GB of ram with less core. But for the datanode sides, I don't recommend to have less specs than that, especially the minimum memory.

avatar
Champion Alumni

Our test cluster (on amazon):

- 5 workers m4.xlarge, 250 GB disk magnetic (we increased the disk to 1T afterwards)

           * we used one of the 5 machine just for flume(kafka) 

- 2 masters m4.2xlarge, 125 GB SSD (we decreased the memory and CPU afterwards ==> m4.xlarge)

 

This was perfect for us for testing purposes.

 

GHERMAN Alina

avatar
New Contributor
Ed, what did you choose in the end? Similar position here. Don't have BIG data yet (only a couple of TB), but planning for future. Thinking of using impala on top.

avatar
New Contributor

Hi,

as usual, it depends of what you need...

The cloudera VM has 1 node with everything and it allows you to see it...

A quite simple cluster could have 2..3 MVs for CM & masters and at least 3 VMs for Workers.

As I said and as you can imagine, it depends what you want to test on it

Believe me, you really need a Cloudera admin to get what you want...

In another thread refered to this blog

I hope, this will help you