Support Questions

ahmed1988 · ‎11-10-2014

Hello I am trying to Setup a Hadoop environment with commodity hardware, in short I want to perform predictive analysis and learn more about Hadoop environment. by performing the analysis on Twitter data using various algorithms such as Clustering, Topic Modeling and Sentiment Analysis, the size of my data set is 10GB with approximately 10000000 tweets. and I have the following hardware specification :

One Desktop Quad Core processor 8GB ram
and two Desktops with Quad Core processor and 4GB ram .

My questions:
Is the hardware specification sufficient for the given tasks or do I need to Upgrade the hardware ?

ZKhan · ‎11-10-2014

Hello,

It's recommended to have this much of disk space to run HDFS opperations smoothly. It's nice to have but not have to have..!!

Thanks,

ZKhan

View solution in original post

ZKhan · ‎11-10-2014

Hi Ahmed,

Given hardware specification is sufficient to run a cluster of 3 nodes. Make sure you have ample amount to disk space to run these opperations. Approx 10TB each.

Thanks and Regards,

ZKhan

ahmed1988 · ‎11-10-2014

hello ZKhan ,

thanks for the replay, by your respose to my question you meant that the HW is sufficient conditioned to 10TB harddrive for each machine, but my whole data set is 10gb, why I need 10TB for each machine.

thank you in advance,

ZKhan · ‎11-10-2014

Hello,

It's recommended to have this much of disk space to run HDFS opperations smoothly. It's nice to have but not have to have..!!

Thanks,

ZKhan

srowen · ‎11-10-2014

It depends a lot on just what you mean by 'analysis'. 1 machine could be just fine. In general I think you will want to play with Spark, and Spark loves memory. So 8GB RAM seems a bit small, but 4 cores is OK, and I bet you have plenty of disk space.

I do not agree at all that you need 10TB of disk space. That is orders of magnitude overkill for a 10GB data set.

ahmed1988 · ‎11-10-2014

hello srowen,

what I meant by analysis using Mahout algorithms on top of hadoop cluster as well MapReduce preprocessing tasks for instnstance tokenization, stemming and translation of 10 million of tweets.

thank you

Cloudera Community

Support Questions

3 Nodes Hadoop Cluster