Reply
Highlighted
Master
Posts: 430
Registered: ‎07-01-2015

Standalone spark 2.1 with Kerberized CDH

Hi,  

 I managed to set up and configure a spark 2.1 with CDH 5.9 (pointing to the hadoop configuration directories) but I cant find which specific settings should I change in spark-env.sh to be able to access a Kerberized HDFS.

I tried to launch a shell with

spark-shell --master=spark://IP:PORT --keytab <path_to_keytab> --principal <principal@REALM>

 

But trying to read the files fails because spark cannot connect the NameNode, obviously the Namenode require a token...

 

Caused by: java.io.IOException: Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]; Host Details : local host is: "cl1deb03dn/10.0.0.6"; destination host is: "cl1deb01nn.lab.hadoop.cloudapp.net":8020;

 

Should I create for the master and slave specific principals in KDC. If yes, where should I place the keytabs and where to put the configuration - which principal to use and where is the file?

 

Thanks

 

 

Cloudera Employee
Posts: 97
Registered: ‎05-10-2016

Re: Standalone spark 2.1 with Kerberized CDH

Instead of setting properties within spark_env.sh, you may want to look at setting HADOOP_CONF_DIR environment variable to point to configuration files for Namenode and YARN.  Cloudera Manager can help manage these configuration files and even distribute those to servers configured as gateway nodes.