Support Questions
Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Innovation Accelerator group hub.

RStudio to HDP Cluster 2.4

Hi,

I am beginner in this area. So the question might be very basic.

We have a HDP 2.4 cluster with Kerbros enabled.We are planning to setup a Client node outside our cluster with RStudioServer installed on it.The objective is to use R in combination with Spark (in the cluster).

The analyst writes R code from RStudio and use libraries like sparkR or sparklyr to connect to HDP Cluster.

On reading the Hortonworks documentation , it says as a pre-requisite to have R installed on all nodes.

The confusing part for me is :

1)I was under the impression that R is required on all nodes in my cluster if you plan to use sparkR from the HDP cluster nodes.( ./bin/sparkR)

Is that the same case when we use RStudio from a client node?

2)Assume we use the below

sc <- spark_connect(master = "spark://IPaddress:port" ...) to connect from RStudio to the cluster.

How do I authenticate to my kerberised cluster from RStudio code?

Thanks

NThomas

1 ACCEPTED SOLUTION

@Nikkie Thomas

User has to run kinit from RStudio Web on daily basis.

For running kinit: Log into RStudio -->Tools-->Shell --> run kinit

Once you generate kgt ticket, then user can do what ever he wants to do on Kerberised hadoop cluster from RStudio

View solution in original post

8 REPLIES 8

@Nikkie Thomas

Just FYI..Open source Rstudio doesn't support Kerberised hadoop cluster. we need to have license in-order to work kerberised cluster.

let me know if you are using licensed version of RStudio.

@Divakar Annapureddy

Thanks for your response. We are using the licensed version RStudioServer Pro. Can you throw some light on the questions posted?

@Nikkie Thomas

User has to run kinit from RStudio Web on daily basis.

For running kinit: Log into RStudio -->Tools-->Shell --> run kinit

Once you generate kgt ticket, then user can do what ever he wants to do on Kerberised hadoop cluster from RStudio

@Divakar Annapureddy

Thanks for the very quick response. I will try that out.

Do you have any insights into the question 1 in the orginal post?

We installed R only on client nodes for Tez engine, i think for SPARK engine also same.

Ok.. so my understanding was correct. We wouldn't need to have R installed on all the cluster nodes. The client node with RStudioServer only needs it(and it will definitely have it) ?

I think yes, we installed only on client nodes, All my R scripts are also written to run on one node from Local Linux . I mean not written in MR fashion to execute the R scripts on all the nodes using Hadoop flavor.

New Contributor

@Nikkie Thomas

You may want to read this and follow the links provided there: https://spark.rstudio.com/guides/connections/#kerberos