05-14-2018 11:19 AM
We have a hadoop cluster that is secured over Kerberos on Cloudera version 5.43. However, Apache Spark is not managed by cloudera. We have our own distribution package with packages and deploys Spark 2.21 onto the cluster. The reason being, the version of Cloudera is 5.43 does not support the Spark 2+, and we need Spark 2.0. Until we upgrade Cloudera to the latest version, we still want to be able to use Apache Spark 2+, so we created a custom package.
Now we have enabled kerberos on the production hadoop cluster, and trying to spark submit after doing a kinit and its not working. We see that we are able to submit jobs on the cluster with a valid keytab file and principal account, but unable to just do kinit and the ticket generated in the kinit session is not used to authenticate the job, instead it asks for a keytab file and throws an authentication error. Why is that so?
Is there a way to not use the keytab/prinicipal to be passed explicitly and just use the kinit session of the user submitting the job? I know when Cloudera manages Spark, we are able to submit spark jobs without a keytab file over just a kinit command. My question is what does Cloudera do differently, to allow kinit when it is managing the application, versus otherwise.
Also, in our case, why is it essential to have a specific keytab file and pass it as args to spark submit / spark shell?
Please provide some context around how Spark authenticates using this data and why cant it use Kinit in this case? I am trying to understand the system better so we can devise appropriate solution that meets the demands at tne moment.