- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Advice on Hardware Specification
- Labels:
-
Apache Hadoop
Created on ‎09-04-2014 12:32 AM - edited ‎09-16-2022 02:06 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am planning to build a cloudera cluster. The application is about text processing on web pages. As a beginner of Hadoop, we would like to have a small scale to start with.
Could any one share the requirement for a small cloudera cluster? For example, the CPU, RAM, hard disk drives, and number of nodes in the cluster?
Thanks.
Created ‎09-04-2014 04:15 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hadoop on AWS instances before buying hardware. You can begin by building a
simple 3-4 node cluster. The hardware requirements depend on your planned
work but you can begin with nodes with 8GB RAM and storage based in your
data set. Get familiar with the software and then look to scale upward
Gautam Gopalakrishnan
Created ‎09-04-2014 04:15 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hadoop on AWS instances before buying hardware. You can begin by building a
simple 3-4 node cluster. The hardware requirements depend on your planned
work but you can begin with nodes with 8GB RAM and storage based in your
data set. Get familiar with the software and then look to scale upward
Gautam Gopalakrishnan
Created ‎09-05-2014 01:26 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for your advice.
I have a confusion starting from my first glance on Hadoop. When we are mentioning about server nowsaday, we actually refer to a VM on a VMWare ESXi or similar vm platform. However, for Hadoop, we are making use of the distributive nature.
I am still confused for a long time. For a production deployment, can I allocate VMs on the same server to use Cloudera? I am concerning the normal practice.
Created ‎09-08-2014 05:16 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
the same host. But performance will be poor if the VMs are starved for CPU
or if they share disks.
Looks like you are just beginning to use Hadoop, so I would suggest first
getting up to speed with installation, and configuration rather than
performance. Get yourself a copy of these two books:
- Hadoop Operations / Eric Sammer
http://shop.oreilly.com/product/0636920025085.do
- Hadoop: The Definitive Guide
http://shop.oreilly.com/product/9780596521981.do
Gautam Gopalakrishnan
Created ‎09-10-2014 11:38 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for your advice.
Currently, I'm getting myself familiar by building the cluster with several VMs on a Linux host. I will get the copy of the books you mentioned!
Now, hope I my understanding is correct, in production environment, each node corresponds to a physical server in a rack. If I want to setup a 4-node cluster, I will probably have 4 1U servers on my rack.
It seems I'd better go for AWS or Google Cloud first. Is there any good option? I just wonder when we use AWS, we are actually using VMs.
Thanks.
