Created on 01-03-2017 10:41 AM - edited 09-16-2022 03:52 AM
Hi,
I have 3 node cluster having Cloudera 5.9 running on CentOS 6.7. I need to connect my R packages (running on my Laptop) to the Spark runing in cluster mode on Hadoop.
However If I try to connect the local R through Sparklyr Connect to Hadoop Spark it is giving Error. As it is searching the Spark home on the laptop itself.
I googled and found we can install SparkR and use R with Spark. However I have few questions regarding the same.
Please help, I am new in this and really need guidance.
Thanks,
Shilpa
Created 01-11-2017 04:20 PM
Thanks for the reply @srowen
The best way to install R and then install SparkR on top of it is here : http://blog.clairvoyantsoft.com/2016/11/installing-sparkr-on-a-hadoop-cluster/
I was able to install them following this link. It is really useful and latest.
Thanks,
Shilpa
Created 01-05-2021 12:10 PM
Hi AutoIN,
This link is opening fine http://site.clairvoyantsoft.com/installing-sparkr-on-a-hadoop-cluster/
but Step f under installation : link is not working as expected https://github.com/apache/spark/archive/.
Can you please provide the location we are using CDH 6.3.3 ans spark version is 2.4.0
Created on 01-06-2021 01:55 AM - edited 01-06-2021 01:58 AM
Hello @PR_224
Please replace steps f to j with what @singh101 suggested in one of the above comments:
https://community.cloudera.com/t5/Support-Questions/Run-SparkR-or-R-package-on-my-Cloudera-5-9-Spark... . The idea is - we make use of the binaries from the CDH parcel, instead of downloading it from upstream.
On a side note: CDP Base provides sparkR out of the box (in case if you plan to upgrade in near future)
Good luck!
Created 03-02-2021 05:56 PM
Thank you AutoIN it worked before only. Took more time to respond.
Created 03-02-2021 06:09 PM
No worries @PR_224
Glad it's fixed : )