02-13-2018 07:14 PM - last edited on 02-14-2018 05:56 AM by cjervis
I have 3 node cluster running on CLoudera 5.9. My Spark is running over Yarn.
I am trying to leverage the Spark benefits, but what is the correct way of running R code over Spark?
I installed SparkR on my cluster and tried running the jobs like this:
$SPARK_HOME/bin/sparkR <Path to my R code>/code.R
but, I feel, it is not able to leverage the in-memory capabilities or the speed of Spark Processing completely. No, application or jobs are created on Spark UI or history server.
Do you think usng Saprklyr would solve the problem? I found something https://www.datacamp.com/community/blog/new-course-introduction-to-spark-in-r-using-sparklyr
Spark gurus, please suggest the best way.