Support Questions
Find answers, ask questions, and share your expertise

Correct way to run run R code over spark

Expert Contributor



I have 3 node cluster running on CLoudera 5.9. My Spark is running over Yarn.


I am trying to leverage the Spark benefits, but what is the correct way of running R code over Spark?


I installed SparkR on my cluster and tried running the jobs like this:


$SPARK_HOME/bin/sparkR <Path to my R code>/code.R

but, I feel, it is not able to leverage the in-memory capabilities or the speed of Spark Processing completely. No, application or jobs are created on Spark UI or history server.


Do you think usng Saprklyr would solve the problem? I found something 


Spark gurus, please suggest the best way.