I am trying to run Spark2 jobs in Altus, using 5.12, secure cluster and configured SDX.
I am trying to submit Hive queries using:
If I enable SDX and submit a spark2 job the job is successful but it uses a local derby hive metastore since it does not pick up the hive-site:
2018-11-12 12:46:44,175 [Driver] INFO org.apache.spark.sql.hive.client.HiveClientImpl - Warehouse location for Hive client (version 1.1.0) is file:/data0/yarn/usercache/altus/appcache/application_1542020573202_0008/container_1542020573202_0008_01_000001/spark-warehouse 2018-11-12 12:46:54,468 [Driver] INFO org.apache.hadoop.hive.metastore.MetaStoreDirectSql - Using direct SQL, underlying DB is DERBY
Printing out the hadoop configuration for the job I see:
Configuration: core-default.xml, core-site.xml, yarn-default.xml, yarn-site.xml, mapred-default.xml, mapred-site.xml, hdfs-default.xml, hdfs-site.xml
It seems like the Oozie Spark action isn't adding the hive-site.xml.
So I added the hive-site using --files in the Altus configuration. This picked up the metastore configuration, but it fails on kerberso tgt:
2018-11-12 14:34:12,372 [Driver] INFO hive.metastore - Trying to connect to metastore with URI thrift://xxxxxxxxxxxxxxx:9083 2018-11-12 14:34:12,406 [Driver] ERROR org.apache.thrift.transport.TSaslTransport - SASL negotiation failure javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211) at org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94) at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271) at org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37) at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
I did see spark.yarn.security.tokens.hive.enabled=false in the job logs by default, so i tried to set this to true but I still get the same Kerberos tgt error.
I managed to get open a spark2-shell on one of the nodes and proxy as the altus user and I was able to submit Hive queries to the correct metastore without issues, so this looks like a Oozie+Spark2 configuration problem.
So, is there a way to use Spark2 with SDX and Altus to submit Hive queries? Is the configuration steps anywhere in the documentation?