Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Standalone spark 2.1 with Kerberized CDH

Standalone spark 2.1 with Kerberized CDH

Master Collaborator

Hi,  

 I managed to set up and configure a spark 2.1 with CDH 5.9 (pointing to the hadoop configuration directories) but I cant find which specific settings should I change in spark-env.sh to be able to access a Kerberized HDFS.

I tried to launch a shell with

spark-shell --master=spark://IP:PORT --keytab <path_to_keytab> --principal <principal@REALM>

 

But trying to read the files fails because spark cannot connect the NameNode, obviously the Namenode require a token...

 

Caused by: java.io.IOException: Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]; Host Details : local host is: "cl1deb03dn/10.0.0.6"; destination host is: "cl1deb01nn.lab.hadoop.cloudapp.net":8020;

 

Should I create for the master and slave specific principals in KDC. If yes, where should I place the keytabs and where to put the configuration - which principal to use and where is the file?

 

Thanks

 

 

1 REPLY 1
Highlighted

Re: Standalone spark 2.1 with Kerberized CDH

Expert Contributor

Instead of setting properties within spark_env.sh, you may want to look at setting HADOOP_CONF_DIR environment variable to point to configuration files for Namenode and YARN.  Cloudera Manager can help manage these configuration files and even distribute those to servers configured as gateway nodes.