Answer
Visitor
Posts: 2
Registered: ‎02-24-2017
How to run Spark2 jobs on Altus using Hive/SDX?

I am trying to run Spark2 jobs in Altus, using 5.12, secure cluster and configured SDX.

 

I am trying to submit Hive queries using:

sparkSession.sql("...")

If I enable SDX and submit a spark2 job the job is successful but it uses a local derby hive metastore since it does not pick up the hive-site:

 

2018-11-12 12:46:44,175 [Driver] INFO org.apache.spark.sql.hive.client.HiveClientImpl - Warehouse location for Hive client (version 1.1.0) is file:/data0/yarn/usercache/altus/appcache/application_1542020573202_0008/container_1542020573202_0008_01_000001/spark-warehouse
2018-11-12 12:46:54,468 [Driver] INFO org.apache.hadoop.hive.metastore.MetaStoreDirectSql - Using direct SQL, underlying DB is DERBY

 

Printing out the hadoop configuration for the job I see:

Configuration: core-default.xml, core-site.xml, yarn-default.xml, yarn-site.xml, mapred-default.xml, mapred-site.xml, hdfs-default.xml, hdfs-site.xml

It seems like the Oozie Spark action isn't adding the hive-site.xml.

So I added the hive-site using --files in the Altus configuration. This picked up the metastore configuration, but it fails on kerberso tgt:

2018-11-12 14:34:12,372 [Driver] INFO  hive.metastore  - Trying to connect to metastore with URI thrift://xxxxxxxxxxxxxxx:9083
2018-11-12 14:34:12,406 [Driver] ERROR org.apache.thrift.transport.TSaslTransport  - SASL negotiation failure
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
	at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
	at org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
	at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271)
	at org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
	at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)

I did see spark.yarn.security.tokens.hive.enabled=false in the job logs by default, so i tried to set this to true but I still get the same Kerberos tgt error.

 

I managed to get open a spark2-shell on one of the nodes and proxy as the altus user and I was able to submit Hive queries to the correct metastore without issues, so this looks like a Oozie+Spark2 configuration problem.

 

So, is there a way to use Spark2 with SDX and Altus to submit Hive queries? Is the configuration steps anywhere in the documentation?

 

Thanks

Other Answers: 0