Support Questions

Find answers, ask questions, and share your expertise

Do we need to install R on every node to use SparkR through Zeppelin?

Explorer

I am trying to run a query using sparkR interpreter in Zeppelin. We have R installed on the namenode machine and Zeppelin installed on the other machine. So when we fire a query using sparkR interpreter it doesn'display any error on the zeppelin screen but when I see the interpreter logs it throws an error: zeppelin Caused by: java.io.IOException: Cannot run program "R" (in directory "."): error=2, No such file or directory . Do we need to install R on every node?

3 REPLIES 3

Yes, you need to install R on all cluster nodes

Explorer

But Apache's document for R interpreter says:

To run R code and visualize plots in Apache Zeppelin, you will need R on your master node (or your dev laptop).

And does sparkR utility distributes the load on the cluster automatically or we need to add any properties for that?

I have spark interpreter where I have set property: master=yarn-client and I have set the spark_home. Is this enough?

Expert Contributor

It depends on how you use sparkR. There are 2 scenarios that require you install R in all the nodes.

* If you use R UDF, then you need to install R in all the nodes. Because that UDF will run in the executor side

* If you want to convert R dataframe to Spark dataframe.