Posts: 2
Registered: ‎02-24-2017
How to run Spark2 jobs on Altus using Hive/SDX?

I am trying to run Spark2 jobs in Altus, using 5.12, secure cluster and configured SDX.


I am trying to submit Hive queries using:


If I enable SDX and submit a spark2 job the job is successful but it uses a local derby hive metastore since it does not pick up the hive-site:


2018-11-12 12:46:44,175 [Driver] INFO org.apache.spark.sql.hive.client.HiveClientImpl - Warehouse location for Hive client (version 1.1.0) is file:/data0/yarn/usercache/altus/appcache/application_1542020573202_0008/container_1542020573202_0008_01_000001/spark-warehouse
2018-11-12 12:46:54,468 [Driver] INFO org.apache.hadoop.hive.metastore.MetaStoreDirectSql - Using direct SQL, underlying DB is DERBY


Printing out the hadoop configuration for the job I see:

Configuration: core-default.xml, core-site.xml, yarn-default.xml, yarn-site.xml, mapred-default.xml, mapred-site.xml, hdfs-default.xml, hdfs-site.xml

It seems like the Oozie Spark action isn't adding the hive-site.xml.

So I added the hive-site using --files in the Altus configuration. This picked up the metastore configuration, but it fails on kerberso tgt:

2018-11-12 14:34:12,372 [Driver] INFO  hive.metastore  - Trying to connect to metastore with URI thrift://xxxxxxxxxxxxxxx:9083
2018-11-12 14:34:12,406 [Driver] ERROR org.apache.thrift.transport.TSaslTransport  - SASL negotiation failure GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
	at org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(
	at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$

I did see in the job logs by default, so i tried to set this to true but I still get the same Kerberos tgt error.


I managed to get open a spark2-shell on one of the nodes and proxy as the altus user and I was able to submit Hive queries to the correct metastore without issues, so this looks like a Oozie+Spark2 configuration problem.


So, is there a way to use Spark2 with SDX and Altus to submit Hive queries? Is the configuration steps anywhere in the documentation?



Other Answers: 0