Generally speaking, you will need to have connectivity from your laptop to at least one machine in the cluster (the gateway), and have some local configuration for sparklyr that indicates where the cluster is. I haven't tried this with sparklyr, but for other R-Hadoop libraries like rhdfs, it means having a copy of the HADOOP_CONF_DIR files from the cluster locally. It also means you probably need the same version of Spark binaries locally as are on the cluster. This is challenging.
SparkR is also something you can try to get working. You would probably need to use an upstream sparkr version that's similar to the CDH Spark you're using (1.x vs 2.x) and then just try to run a ./bin/sparkr from its distirbution.
Standalone mode isn't supported. None of these (sparkr, sparklyr) are supported by Cloudera, and so have no relationship to CM. You should not modify your existing Spark service and shouldn't have to.
I already went to the link you mentioned, it gives and example how to connect to your local Spark. Which I have been able to do however if I try to connect to my remote Spark Cluster running on cloudera it is giving error.
4: running command '"C:\Users\diegot\Desktop\hdfs:\188.8.131.52:8020\opt\cloudera\parcels\CDH-5.9.0-1.cdh5.9.0.p0.23\lib\spark\tmp\hadoop\bin\winutils.exe" chmod 777 "C:\Users\diegot\Desktop\hdfs:\184.108.40.206:8020\opt\cloudera\parcels\CDH-5.9.0-1.cdh5.9.0.p0.23\lib\spark\tmp\hive"' had status 127
Once, I have the package, I just untar the package on Namenode, go to bin directory and Execute it. Is that it?
What I did to install R on Spark home, I got the epel RPM, and then tried to install R using YUM however its giving error. I even tried some other RPM however they are giving error too. Using --skip-broken option is also not working. Please help
Generally, you won't be able to run R on your laptop/workstation and connect it remotely to the cluster. It's possible, but would require more setup and configuration, so I would avoid this deployment for now. Instead, run R on a cluster gateway node.
You are using a standalone master, which isn't supported anyway. You would want to use YARN.
Although you should be able to use your own copy of SparkR 1.6 with the cluster, I don't know if it works. It's not supported. sparklyr is another option, which at least is supported by RStudio.
worked fine for me as well . Just few things I had to do extra :
1. In the testing section when I typed sparkR , it errored out . Seems you'll have to create links for that to work . In my case I had CDH parcel installation , thus I created below two links , and it worked fine therefater :