Created 03-30-2017 05:40 AM
Hi,
I am beginner in this area. So the question might be very basic.
We have a HDP 2.4 cluster with Kerbros enabled.We are planning to setup a Client node outside our cluster with RStudioServer installed on it.The objective is to use R in combination with Spark (in the cluster).
The analyst writes R code from RStudio and use libraries like sparkR or sparklyr to connect to HDP Cluster.
On reading the Hortonworks documentation , it says as a pre-requisite to have R installed on all nodes.
The confusing part for me is :
1)I was under the impression that R is required on all nodes in my cluster if you plan to use sparkR from the HDP cluster nodes.( ./bin/sparkR)
Is that the same case when we use RStudio from a client node?
2)Assume we use the below
sc <- spark_connect(master = "spark://IPaddress:port" ...) to connect from RStudio to the cluster.
How do I authenticate to my kerberised cluster from RStudio code?
Thanks
NThomas
Created 03-30-2017 06:03 AM
User has to run kinit from RStudio Web on daily basis.
For running kinit: Log into RStudio -->Tools-->Shell --> run kinit
Once you generate kgt ticket, then user can do what ever he wants to do on Kerberised hadoop cluster from RStudio
Created 03-30-2017 05:45 AM
Just FYI..Open source Rstudio doesn't support Kerberised hadoop cluster. we need to have license in-order to work kerberised cluster.
let me know if you are using licensed version of RStudio.
Created 03-30-2017 05:54 AM
Thanks for your response. We are using the licensed version RStudioServer Pro. Can you throw some light on the questions posted?
Created 03-30-2017 06:03 AM
User has to run kinit from RStudio Web on daily basis.
For running kinit: Log into RStudio -->Tools-->Shell --> run kinit
Once you generate kgt ticket, then user can do what ever he wants to do on Kerberised hadoop cluster from RStudio
Created 03-30-2017 06:15 AM
Thanks for the very quick response. I will try that out.
Do you have any insights into the question 1 in the orginal post?
Created 03-30-2017 06:30 AM
We installed R only on client nodes for Tez engine, i think for SPARK engine also same.
Created 03-30-2017 06:40 AM
Ok.. so my understanding was correct. We wouldn't need to have R installed on all the cluster nodes. The client node with RStudioServer only needs it(and it will definitely have it) ?
Created 03-30-2017 06:55 AM
I think yes, we installed only on client nodes, All my R scripts are also written to run on one node from Local Linux . I mean not written in MR fashion to execute the R scripts on all the nodes using Hadoop flavor.
Created 04-30-2018 01:06 PM
You may want to read this and follow the links provided there: https://spark.rstudio.com/guides/connections/#kerberos