Created on 09-04-2014 12:32 AM - edited 09-16-2022 02:06 AM
I am planning to build a cloudera cluster. The application is about text processing on web pages. As a beginner of Hadoop, we would like to have a small scale to start with.
Could any one share the requirement for a small cloudera cluster? For example, the CPU, RAM, hard disk drives, and number of nodes in the cluster?
Thanks.
Created 09-04-2014 04:15 AM
Created 09-04-2014 04:15 AM
Created 09-05-2014 01:26 AM
Thanks for your advice.
I have a confusion starting from my first glance on Hadoop. When we are mentioning about server nowsaday, we actually refer to a VM on a VMWare ESXi or similar vm platform. However, for Hadoop, we are making use of the distributive nature.
I am still confused for a long time. For a production deployment, can I allocate VMs on the same server to use Cloudera? I am concerning the normal practice.
Created 09-08-2014 05:16 AM
Created 09-10-2014 11:38 PM
Thanks for your advice.
Currently, I'm getting myself familiar by building the cluster with several VMs on a Linux host. I will get the copy of the books you mentioned!
Now, hope I my understanding is correct, in production environment, each node corresponds to a physical server in a rack. If I want to setup a 4-node cluster, I will probably have 4 1U servers on my rack.
It seems I'd better go for AWS or Google Cloud first. Is there any good option? I just wonder when we use AWS, we are actually using VMs.
Thanks.