Support Questions

Find answers, ask questions, and share your expertise

How do I Setup R (SparkR) for Spark2?

I have RStudio installed and running on my edge node. I installed R on 6 datanodes running Spark2. I have several questions.

  1. I have R version 3.4.1 (2017-06-30) -- "Single Candle" installed on the datanodes, Do I need to set HOME directories on the datanode or do I need other programs installed. (Running the latest HDP).
  2. I have run sparklyr before and it creates a Yarn job when running, is SparkR different from sparklyr?
  3. What packages must be installed on the R server when running SparkR?
  4. Is the a good step by step on setting up and ruinning SparkR?

Expert Contributor

Hi Clay,

1.once you install R, sparkR should just pick it up from the default location. I dont think there is a need to setup HOME directories.

2. SparkR can run on yarn or in local mode, depending on how you submit the actual job.

3. You need R and it dependencies installed. You can additional packages as needed from inside your sparkr jobs. i think you just install.packages command.

4. this link is a good starting point, there are internal links that give details on how spark and R integrate.

So the link to the Spark Component Guide refers to Spark version 1.6.3,. Is there different steps for Spark2?

Expert Contributor

@Clay McDonald The steps should be the same.