11-09-2017 07:10 AM - last edited on 11-09-2017 07:37 AM by cjervis
As checking some resource through community, we are recommended to use sparklyr to support R in Spark+Yarn. But for my case, i just want to support data scientist submit their R jobs to Spark+Yarn even in Yarn client mode. As data scientist suggested to install R console/Studio in gateway/edge node of CDH cluster, but i want to know the detail how we can support the R job running on Yarn. What must be installed on gateway/edge node? and what must be installed on Yarn nodes?
As https://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/install-sparkr-on-cdh-5-8/td-p/459... said by Cloudera guy hubbarja, Spark R is not supported by CDH version officially? any way to install it through Parcel management? If no internet access, can we package the R library manually for internal R package installation?
If any guys have any comments, very appreicated.