Created on 10-19-2015 06:28 AM - edited 09-16-2022 02:44 AM
Hi
I've been tasked with setting up a Hadoop cluster for testing a new big data initiative. However I'm pretty much completely new to all of this. I know that one can set up a single node cluster for proof of concept, but I would like to know what is the minimum number of nodes, and what spec (amount of RAM & disk space) for a proper cluster. Imagine a low throughput as it's only an initial test cluster (fewer than 10 users). And we only need Kafka, HDFS, Pig & Hive services to run.
We generally have the ability to spin up Centos 6 VM's with 4GB RAM each, and I might be able to up that to 8GB each. But Reading many of the setup pages, it's quoting minimums of 10s of GB of RAM (e.g. http://blog.cloudera.com/blog/2013/08/how-to-select-the-right-hardware-for-your-new-hadoop-cluster/)... but the cloudera manager setup only asks for at least 4GB on that node (http://www.cloudera.com/content/www/en-us/documentation/enterprise/5-3-x/topics/cm_ig_cm_requirement... and mentions nothing around the other node's specs.
Let me know if you need any more information. I realise it's probably too vague as is.
Cheers,
Ed