I am trying to run a query using sparkR interpreter in Zeppelin. We have R installed on the namenode machine and Zeppelin installed on the other machine. So when we fire a query using sparkR interpreter it doesn'display any error on the zeppelin screen but when I see the interpreter logs it throws an error: zeppelin Caused by: java.io.IOException: Cannot run program "R" (in directory "."): error=2, No such file or directory . Do we need to install R on every node?
But Apache's document for R interpreter says:
To run R code and visualize plots in Apache Zeppelin, you will need R on your master node (or your dev laptop).
And does sparkR utility distributes the load on the cluster automatically or we need to add any properties for that?
I have spark interpreter where I have set property: master=yarn-client and I have set the spark_home. Is this enough?
It depends on how you use sparkR. There are 2 scenarios that require you install R in all the nodes.
* If you use R UDF, then you need to install R in all the nodes. Because that UDF will run in the executor side
* If you want to convert R dataframe to Spark dataframe.