New Contributor
Posts: 1
Registered: ‎11-18-2016

How to support job of R language to CDH yarn+Spark

[ Edited ]

As checking some resource through community, we are recommended to use sparklyr to support R in Spark+Yarn. But for my case, i just want to support data scientist submit their R jobs to Spark+Yarn even in Yarn client mode. As data scientist suggested to install R console/Studio in gateway/edge node of CDH cluster, but i want to know the detail how we can support the R job running on Yarn. What must be installed on gateway/edge node? and what must be installed on Yarn nodes?

As said by Cloudera guy hubbarja, Spark R is not supported by CDH version officially? any way to install it through Parcel management? If no internet access, can we package the R library manually for internal R package installation?


If any guys have any comments, very appreicated.