Reply
RMG
Explorer
Posts: 9
Registered: ‎09-02-2015

Cloudera and Cloud Computing

Please accept my sincere salutations.

 

I installed a virtual machine Cloudera 5.3 where I installed also the RStudio.I did some tasks with RStudio on small dataset  locally and now I like to  repeat the same tasks with Big dataset(which means Big Data).In fact, I can't do it locally so I need to do it via cloud.

Could you advice me about the best solution to do thid job in the cloud.If Cloudera or Rstudio allow tasks to be transfomed in the cloud?What is the best  free simulator'if it exists)?

Thank you for your suggestions!!

 

 

Cloudera Employee Sue
Cloudera Employee
Posts: 44
Registered: ‎09-11-2015

Re: Cloudera and Cloud Computing

Hi RMG,

There is a package called RHadoop which will allow you to write R programs that run in a Hadoop cluster. For more information, see: RHadoop

 

Before moving right to a cloud-based solution, I would suggest trying out RHadoop on your VM first. The best option, and the tool I use most, is Cloudera's Quickstart VM. You can download your preferred version of the VM and run it on your laptop. This is essentially a single node cluster; while you can't use it for Big Data processing, it is a great way to get some experience with RHadoop. I would recommend doing this as a learning step prior to creating a cluster in a public (or private) cloud.

 

To install RHadoop, google for instructions. I found these instructions, but have not tested: RHadoop Installation in Cloudera Quickstart VM

 

Finally, you can spin up a Hadoop cluster in your favorite public cloud and follow the same link above to install RHadoop in the cluster.

If you are unfamiliar with using public clouds, it is best to do some reading first. Take care when running your cluster - fees add up quickly if you leave your VMs (instances) running for a long time. Monitor the charges, shut down the VMs when not in use.

 

Also see Cloudera Director, Cloudera Live, and the Cloudera Demo tutorial here: http://www.cloudera.com/content/cloudera/en/products-and-services/cloudera-live.html

 

HTH

RMG
Explorer
Posts: 9
Registered: ‎09-02-2015

Re: Cloudera and Cloud Computing

Thank you for your suggestions Sue, that's what I did exactly: I installed RHADOOP under Cloudera also R and RStudio for doing my analysis(single node cluster).I terminated this step with success and now I search for a public and free cloud that allow me to run my code with large dataset. But what I find don't answer to my needs because all the clouds that I find are not free.

What can I do please!

Cloudera Employee Sue
Cloudera Employee
Posts: 44
Registered: ‎09-11-2015

Re: Cloudera and Cloud Computing

That's great RMG. I don't know of a free 'simulator', so can't help you there. However, cloud vendors do provide offers, such as Google for Google Compute Platform. There is currently a 60 day/$300 trial offer here: https://cloud.google.com/free-trial/ You may also find similar Amazon AWS trials available. 

 

 

Explorer
Posts: 14
Registered: ‎05-06-2014

Re: Cloudera and Cloud Computing

I would recommend the SparkR package which works similarly as the dplyr package. I find it a lot easier to use than RHadoop which is still based on MapReduce under the hood. The big data community is moving rapidly towards Spark. For more information about SparkR please see the cloudera community post here under.

 

https://community.cloudera.com/t5/Data-Science-and-Machine/Spark-R-in-Cloudera-5-3-0/td-p/37706

Announcements