Reply
Expert Contributor
Posts: 63
Registered: ‎11-17-2016

Correct way to run run R code over spark

[ Edited ]

Hi,

 

I have 3 node cluster running on CLoudera 5.9. My Spark is running over Yarn.

 

I am trying to leverage the Spark benefits, but what is the correct way of running R code over Spark?

 

I installed SparkR on my cluster and tried running the jobs like this:

 

$SPARK_HOME/bin/sparkR <Path to my R code>/code.R

but, I feel, it is not able to leverage the in-memory capabilities or the speed of Spark Processing completely. No, application or jobs are created on Spark UI or history server.

 

Do you think usng Saprklyr would solve the problem? I found something https://www.datacamp.com/community/blog/new-course-introduction-to-spark-in-r-using-sparklyr 

 

Spark gurus, please suggest the best way.

 

Thanks,

Shilpa

Announcements