Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Zeppelin Livy2 interpreter running Hive queries gets org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]

Zeppelin Livy2 interpreter running Hive queries gets org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]

Explorer

After integrating all our cluster's services with Active Directory I was forced to start using the Livy2 interpreter for Zeppelin.

Everything is now working as before, but the Livy2 interpreter can not execute Hive queries:

%livy2

val testDF = spark.sql("SELECT * FROM business_process_management.awd_bpm_link")
testDF.show()

Output:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 7, Node107FQDN, executor 1): java.io.IOException: Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]; Host Details : local host is: "Node107FQDN/Node107IPAddress"; destination host is: "PrimaryNameNodeFQDN":8020;

This works:

%livy2
spark.version

Output:
res0: String = 2.1.1.2.6.2.0-205

Everything else is working as before while logged in as an Active Directory user that has been granted access to the relevant folders and databases via Ranger:

  • Hive View 2.0 via Ambari
  • Scala code using spark to query hive (read and write) via spark-shell, spark-submit and Scala applications running in Oozie workflow actions
  • Zeppelin's spark2 interpreter querying databases to which the zeppelin user has been granted access.

I can post more information about configuration settings here, but I am not sure which settings are actually relevant. Since everything else is working as expected, here are my Livy2 interpreter settings:

Properties
namevalue
livy.spark.deploy_modecluster
livy.spark.dynamicAllocation.cachedExecutorIdleTimeout
livy.spark.dynamicAllocation.enabled
livy.spark.dynamicAllocation.initialExecutors
livy.spark.dynamicAllocation.maxExecutors
livy.spark.dynamicAllocation.minExecutors
livy.spark.executor.cores
livy.spark.executor.instances
livy.spark.executor.memory8GB
livy.spark.jars[path to jar file],[path to another jar file]
livy.spark.jars.packages
zeppelin.interpreter.localRepo/usr/hdp/current/zeppelin-server/local-repo/2DSXMN1CA
zeppelin.interpreter.output.limit
zeppelin.livy.concurrentSQLfalse
zeppelin.livy.displayAppInfotrue
zeppelin.livy.keytab/etc/security/keytabs/zeppelin.server.kerberos.keytab
zeppelin.livy.principalzeppelin-clustername@OUR.REALM.COM
zeppelin.livy.pull_status.interval.millis
zeppelin.livy.session.create_timeout
zeppelin.livy.spark.sql.field.truncatetrue
zeppelin.livy.spark.sql.maxResult
zeppelin.livy.urlhttp://LivyNodeFQDN:8999
zeppelin.spark.useHiveContexttrue

ANY help will be appreciated and if I get any solution I will post it here.

I am also planning to document the process of getting a Hortonworks cluster to actually work and post it online. I hope I can spare everyone else the pain I had to go through so far, just to get Hortonworks to work like a data lake should work.

Don't have an account?
Coming from Hortonworks? Activate your account here