Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Multiple spark version on Kerberos cluster

Multiple spark version on Kerberos cluster

New Contributor

In a previous post on Multiple Spark versions, already solved, it was defined how to use multiple Spark versions on the same cluster. I commented there but I port it here since that one is solved.


Although it has a focus on Spark 2.x, I adapted it to use a 1.6.3 Spark in a CDH 5.4 (Spark 1.3), it went fine.


Then I am trying to do the same in a CDH 5.5.4 Kerberos environment, and that's where the problem is.


There, I installed 1.6.3 Spark as I did in the non-krb cluster. I am able to launch an app on with standard installation (1.5) as expected, just issuing a kerberos ticket, and running spark-submit. 

But when I do the same with the 1.6.3, it fails. Same principal, domain, etc - all the same.


On the logs, I can see two main differences. One, is that HiveConf gives a warning on 1.6.3 that it does not on 1.5.0: 



INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token xxx for <user> on ha-hdfs:nameservice1
WARN HiveConf: HiveConf of name hive.enable.spark.execution.engine does not exist
WARN HiveConf: HiveConf of name hive.enable.spark.execution.engine does not exist
INFO hive.metastore: Trying to connect to metastore with URI thrift://<url:port>
INFO hive.metastore: Connected to metastore



Actually, later on, does some resource uploads apparently fine:



INFO yarn.Client: Uploading resource file:/... -> hdfs://nameservice1/... 


.. but after resources are uploaded, it submits the application and there goes the error. 



INFO yarn.Client: Submitting application 67672 to ResourceManager
INFO impl.YarnClientImpl: Submitted application application_...
INFO yarn.Client: Application report for .. ( state: FAILED)
client token; N/A
diagnostics: Failed to renew token: Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:nameservice1, Ident: (HDFS_DELEGATION_TOKEN xxxx for yyyy)



@srowen had mentioned before some tweaking was necessary if dealing with hive metastore. May it have something to do with this Kerberos authentication problem? I have copied the hive-conf as it was suggested, but apparently is not picking it up correctly.


I'm afraid I don't fully understand the relation between Spark + Hive + Kerberos here. Any hint or direction to head to would be greatly appreciated.


Just to note, that this happens with every app, even launching the SparkPi application in yarn-cluster mode. 

 In all cases, I am launching the app with spark-submit without explicit Kerberos settings (e.g. --principal, --keytab), though I have tried to no avail - and neither that was necessary for running fine on 1.5.0. 

Don't have an account?
Coming from Hortonworks? Activate your account here