Standalone spark 2.1 with Kerberized CDH


 I managed to set up and configure a spark 2.1 with CDH 5.9 (pointing to the hadoop configuration directories) but I cant find which specific settings should I change in to be able to access a Kerberized HDFS.

I tried to launch a shell with

spark-shell --master=spark://IP:PORT --keytab <path_to_keytab> --principal <principal@REALM>


But trying to read the files fails because spark cannot connect the NameNode, obviously the Namenode require a token...


Caused by: Failed on local exception: Client cannot authenticate via:[TOKEN, KERBEROS]; Host Details : local host is: "cl1deb03dn/"; destination host is: "":8020;


Should I create for the master and slave specific principals in KDC. If yes, where should I place the keytabs and where to put the configuration - which principal to use and where is the file?





Re: Standalone spark 2.1 with Kerberized CDH

Instead of setting properties within, you may want to look at setting HADOOP_CONF_DIR environment variable to point to configuration files for Namenode and YARN.  Cloudera Manager can help manage these configuration files and even distribute those to servers configured as gateway nodes.