We're planning to deploy a 5-nodes (1 for namenode, 4 for datanodes) CDH cluster (managed by Cloudera Manager) for internal development and experimentations.
Can anyone suggests what is the recommended Hardware requirements (especially OS Disk capacity for both NameNode and DataNodes) that we can follow for our environment?
We already found this blog ( https://blog.cloudera.com/blog/2013/08/how-to-select-the-right-hardware-for-your-new-hadoop-cluster/ ) but we find it's too much for our experimental cluster.
It depends of the usage or type of use.
You will use this cluster for real-time or batch processing?
Which technologies you will use? yarn, spark, hive, impala, flume, kafka…
Typical hadoop cluster or kudu cluster?