Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

3 Nodes Hadoop Cluster

avatar
New Contributor

Hello I am trying to Setup a Hadoop environment with commodity hardware, in short I want to perform predictive analysis and learn more about Hadoop environment. by performing the analysis on Twitter data using various algorithms such as Clustering, Topic Modeling and Sentiment Analysis, the size of my data set is 10GB with approximately 10000000 tweets. and I have the following hardware specification :


One Desktop Quad Core processor 8GB ram
and two Desktops with Quad Core processor and 4GB ram .

My questions:
Is the hardware specification sufficient for the given tasks or do I need to Upgrade the hardware ?

1 ACCEPTED SOLUTION

avatar
Explorer

Hello,

 

It's recommended to have this much of disk space to run HDFS opperations smoothly. It's nice to have but not have to have..!!

 

Thanks,

ZKhan

View solution in original post

5 REPLIES 5

avatar
Explorer

Hi Ahmed,

 

Given hardware specification is sufficient to run a cluster of 3 nodes. Make sure you have ample amount to disk space to run these opperations. Approx 10TB each.

 

Thanks and Regards,

ZKhan

avatar
New Contributor

hello ZKhan ,

 

thanks for the replay, by your respose to my question you meant that the HW is sufficient conditioned to 10TB harddrive for each machine, but my whole data set is 10gb, why I need 10TB for each machine. 

 

 

thank you in advance,

 

 

 

avatar
Explorer

Hello,

 

It's recommended to have this much of disk space to run HDFS opperations smoothly. It's nice to have but not have to have..!!

 

Thanks,

ZKhan

avatar
Master Collaborator

It depends a lot on just what you mean by 'analysis'. 1 machine could be just fine. In general I think you will want to play with Spark, and Spark loves memory. So 8GB RAM seems a bit small, but 4 cores is OK, and I bet you have plenty of disk space.

 

I do not agree at all that you need 10TB of disk space. That is orders of magnitude overkill for a 10GB data set.

avatar
New Contributor

hello  srowen,

 

what I meant by analysis using Mahout algorithms on top of hadoop cluster as well MapReduce preprocessing tasks for instnstance tokenization, stemming and translation of 10 million of tweets. 

 

thank you