Support Questions

Find answers, ask questions, and share your expertise

Run SparkR | or R package on my Cloudera 5.9 Spark

avatar
Expert Contributor

Hi,

 

I have 3 node cluster having Cloudera 5.9 running on CentOS 6.7. I need to connect my R packages (running on my Laptop) to the Spark runing in cluster mode on Hadoop.

 

However If I try to connect the local R through Sparklyr Connect to Hadoop Spark it is giving Error. As it is searching the Spark home on the laptop itself.

 

I googled and found we can install SparkR and use R with Spark. However I have few questions regarding the same.

 

  1. I have downloaded the tar file from https://amplab-extras.github.io/SparkR-pkg/ But my question is I directly copy it to my Linux server and install?
  2. Do I have to Stop/delete my existing Spark which is NOT Stand Alone and using Yarn i.e. it is running in Cluster mode? or SparkR can just run on top of it, If I install it on the server?
  3. Or do I have to run Spark on Stand Alone mode (get Spark gateways running and Start master/slave using script) and install the package from linux command line on top of it?
  4. If it get installed will I be able to access it through CM UI?

Please help, I am new in this and really need guidance.

 

Thanks,

Shilpa

1 ACCEPTED SOLUTION

avatar
Expert Contributor

Thanks for the reply @srowen

 

The best way to install R and then install SparkR on top of it is here : http://blog.clairvoyantsoft.com/2016/11/installing-sparkr-on-a-hadoop-cluster/ 

 

I was able to install them following this link. It is really useful and latest.

 

Thanks,

Shilpa

View solution in original post

13 REPLIES 13

avatar
New Contributor


Hi AutoIN, 

 


This link is opening fine http://site.clairvoyantsoft.com/installing-sparkr-on-a-hadoop-cluster/
but Step f under installation : link is not working as expected https://github.com/apache/spark/archive/.

Can you please provide the location we are using CDH 6.3.3 ans spark version is 2.4.0

 

avatar
Master Collaborator

Hello @PR_224 

 

Please replace steps f to j with what @singh101 suggested in one of the above comments:

https://community.cloudera.com/t5/Support-Questions/Run-SparkR-or-R-package-on-my-Cloudera-5-9-Spark... . The idea is - we make use of the binaries from the CDH parcel, instead of downloading it from upstream.

 

On a side note: CDP Base provides sparkR out of the box (in case if you plan to upgrade in near future)

 

Good luck!

avatar
New Contributor

Thank you AutoIN it worked before only.  Took more time to respond. 

avatar
Master Collaborator

No worries @PR_224 
Glad it's fixed : )