Answer
Visitor
Posts: 2
Registered: ‎02-24-2017
How to run Spark2 jobs on Altus using Hive/SDX?

I am trying to run Spark2 jobs in Altus, using 5.12, secure cluster and configured SDX.

 

I am trying to submit Hive queries using:

sparkSession.sql("...")

If I enable SDX and submit a spark2 job the job is successful but it uses a local derby hive metastore since it does not pick up the hive-site:

 

2018-11-12 12:46:44,175 [Driver] INFO org.apache.spark.sql.hive.client.HiveClientImpl - Warehouse location for Hive client (version 1.1.0) is file:/data0/yarn/usercache/altus/appcache/application_1542020573202_0008/container_1542020573202_0008_01_000001/spark-warehouse
2018-11-12 12:46:54,468 [Driver] INFO org.apache.hadoop.hive.metastore.MetaStoreDirectSql - Using direct SQL, underlying DB is DERBY

 

Printing out the hadoop configuration for the job I see:

Configuration: core-default.xml, core-site.xml, yarn-default.xml, yarn-site.xml, mapred-default.xml, mapred-site.xml, hdfs-default.xml, hdfs-site.xml

It seems like the Oozie Spark action isn't adding the hive-site.xml.

So I added the hive-site using --files in the Altus configuration. This picked up the metastore configuration, but it fails on kerberso tgt:

2018-11-12 14:34:12,372 [Driver] INFO  hive.metastore  - Trying to connect to metastore with URI thrift://xxxxxxxxxxxxxxx:9083
2018-11-12 14:34:12,406 [Driver] ERROR org.apache.thrift.transport.TSaslTransport  - SASL negotiation failure
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
	at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
	at org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
	at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271)
	at org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
	at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)

I did see spark.yarn.security.tokens.hive.enabled=false in the job logs by default, so i tried to set this to true but I still get the same Kerberos tgt error.

 

I managed to get open a spark2-shell on one of the nodes and proxy as the altus user and I was able to submit Hive queries to the correct metastore without issues, so this looks like a Oozie+Spark2 configuration problem.

 

So, is there a way to use Spark2 with SDX and Altus to submit Hive queries? Is the configuration steps anywhere in the documentation?

 

Thanks

by diceK on ‎11-14-2018 09:32 PM
Hi alexbushIf,

> I enable SDX and submit a spark2 job the job is successful but it uses a local derby hive metastore since it does not pick up the hive-site:

How did you submit the job? Was it through the Altus web console, otherwise through another way?

Thanks,
Daisuke
by alexbush on ‎11-15-2018 01:59 AM
I am submitting it through the Java SDK.

However looking in the console, for the first job there are no configuration parameters that would suggest anything about Hive.

For the second job that failed with tgt errors, I can see the --files parameter in the Altus console, so everything looks like it is flowing through to the console.
Other Answers: 0