Support Questions

Find answers, ask questions, and share your expertise

RStudio to HDP Cluster 2.4

avatar

Hi,

I am beginner in this area. So the question might be very basic.

We have a HDP 2.4 cluster with Kerbros enabled.We are planning to setup a Client node outside our cluster with RStudioServer installed on it.The objective is to use R in combination with Spark (in the cluster).

The analyst writes R code from RStudio and use libraries like sparkR or sparklyr to connect to HDP Cluster.

On reading the Hortonworks documentation , it says as a pre-requisite to have R installed on all nodes.

The confusing part for me is :

1)I was under the impression that R is required on all nodes in my cluster if you plan to use sparkR from the HDP cluster nodes.( ./bin/sparkR)

Is that the same case when we use RStudio from a client node?

2)Assume we use the below

sc <- spark_connect(master = "spark://IPaddress:port" ...) to connect from RStudio to the cluster.

How do I authenticate to my kerberised cluster from RStudio code?

Thanks

NThomas

1 ACCEPTED SOLUTION

avatar

@Nikkie Thomas

User has to run kinit from RStudio Web on daily basis.

For running kinit: Log into RStudio -->Tools-->Shell --> run kinit

Once you generate kgt ticket, then user can do what ever he wants to do on Kerberised hadoop cluster from RStudio

View solution in original post

8 REPLIES 8

avatar

@Nikkie Thomas

Just FYI..Open source Rstudio doesn't support Kerberised hadoop cluster. we need to have license in-order to work kerberised cluster.

let me know if you are using licensed version of RStudio.

avatar

@Divakar Annapureddy

Thanks for your response. We are using the licensed version RStudioServer Pro. Can you throw some light on the questions posted?

avatar

@Nikkie Thomas

User has to run kinit from RStudio Web on daily basis.

For running kinit: Log into RStudio -->Tools-->Shell --> run kinit

Once you generate kgt ticket, then user can do what ever he wants to do on Kerberised hadoop cluster from RStudio

avatar
@Divakar Annapureddy

Thanks for the very quick response. I will try that out.

Do you have any insights into the question 1 in the orginal post?

avatar

We installed R only on client nodes for Tez engine, i think for SPARK engine also same.

avatar

Ok.. so my understanding was correct. We wouldn't need to have R installed on all the cluster nodes. The client node with RStudioServer only needs it(and it will definitely have it) ?

avatar

I think yes, we installed only on client nodes, All my R scripts are also written to run on one node from Local Linux . I mean not written in MR fashion to execute the R scripts on all the nodes using Hadoop flavor.

avatar
New Contributor

@Nikkie Thomas

You may want to read this and follow the links provided there: https://spark.rstudio.com/guides/connections/#kerberos