After integrating all our cluster's services with Active Directory I was forced to start using the Livy2 interpreter for Zeppelin.
Everything is now working as before, but the Livy2 interpreter can not execute Hive queries:
val testDF = spark.sql("SELECT * FROM business_process_management.awd_bpm_link")
Output: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 7, Node107FQDN, executor 1): java.io.IOException: Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]; Host Details : local host is: "Node107FQDN/Node107IPAddress"; destination host is: "PrimaryNameNodeFQDN":8020;
Output: res0: String = 22.214.171.124.6.2.0-205
Everything else is working as before while logged in as an Active Directory user that has been granted access to the relevant folders and databases via Ranger:
Hive View 2.0 via Ambari
Scala code using spark to query hive (read and write) via spark-shell, spark-submit and Scala applications running in Oozie workflow actions
Zeppelin's spark2 interpreter querying databases to which the zeppelin user has been granted access.
I can post more information about configuration settings here, but I am not sure which settings are actually relevant. Since everything else is working as expected, here are my Livy2 interpreter settings:
ANY help will be appreciated and if I get any solution I will post it here.
I am also planning to document the process of getting a Hortonworks cluster to actually work and post it online. I hope I can spare everyone else the pain I had to go through so far, just to get Hortonworks to work like a data lake should work.