Hi, If I install HDP 2.4.x from scratch, and I want to use SparkR, do I need to install also an R distribution on the workers node?
Without R I'm able to use spark-shell from a node outside the cluster and everithing seems good (run Pi Estimation example without problem).
I am also able to use sparklyr library and do stuff, but if I understood well, it's because the sparklyr library perform some kind of translation of R code in something like scala.
When I tried to use SparkR library, it seems like it search the Rscript executable on the cluster. Do I need to install it on the workers or master node of the cluster? Or it's a problem with the cluster configuration? Or is a problem with the "submitter" outside of the cluster?
From what I understand, you need R on all nodes. There's some additional info here, depending on your scenario: