Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Advice on Hardware Specification

avatar
New Contributor

I am planning to build a cloudera cluster. The application is about text processing on web pages. As a beginner of Hadoop, we would like to have a small scale to start with.

 

Could any one share the requirement for a small cloudera cluster? For example, the CPU, RAM, hard disk drives, and number of nodes in the cluster?

 

Thanks.

1 ACCEPTED SOLUTION

avatar
Hello Charles, as a beginner it would be easier if you experimented with
Hadoop on AWS instances before buying hardware. You can begin by building a
simple 3-4 node cluster. The hardware requirements depend on your planned
work but you can begin with nodes with 8GB RAM and storage based in your
data set. Get familiar with the software and then look to scale upward

Regards,
Gautam Gopalakrishnan

View solution in original post

4 REPLIES 4

avatar
Hello Charles, as a beginner it would be easier if you experimented with
Hadoop on AWS instances before buying hardware. You can begin by building a
simple 3-4 node cluster. The hardware requirements depend on your planned
work but you can begin with nodes with 8GB RAM and storage based in your
data set. Get familiar with the software and then look to scale upward

Regards,
Gautam Gopalakrishnan

avatar
New Contributor

Thanks for your advice.

 

I have a confusion starting from my first glance on Hadoop. When we are mentioning about server nowsaday, we actually refer to a VM on a VMWare ESXi or similar vm platform. However, for Hadoop, we are making use of the distributive nature.

 

I am still confused for a long time. For a production deployment, can I allocate VMs on the same server to use Cloudera? I am concerning the normal practice.

avatar
If you are only looking to learn, you are fine with using multiple VMs on
the same host. But performance will be poor if the VMs are starved for CPU
or if they share disks.

Looks like you are just beginning to use Hadoop, so I would suggest first
getting up to speed with installation, and configuration rather than
performance. Get yourself a copy of these two books:

- Hadoop Operations / Eric Sammer
http://shop.oreilly.com/product/0636920025085.do

- Hadoop: The Definitive Guide
http://shop.oreilly.com/product/9780596521981.do

Regards,
Gautam Gopalakrishnan

avatar
New Contributor

Thanks for your advice.

 

Currently, I'm getting myself familiar by building the cluster with several VMs on a Linux host. I will get the copy of the books you mentioned!

 

Now, hope I my understanding is correct, in production environment, each node corresponds to a physical server in a rack. If I want to setup a 4-node cluster, I will probably have 4 1U servers on my rack.

 

It seems I'd better go for AWS or Google Cloud first. Is there any good option? I just wonder when we use AWS, we are actually using VMs.

 

Thanks.