Reply
Highlighted
Explorer
Posts: 19
Registered: ‎02-11-2016
Accepted Solution

Oozie - Spark Action - Hive

I'm having problems getting a real hive context in a spark-scala application (jar) that is running as an Oozie spark action.   The spark app does write to hdfs folders just fine.  But it is unable to see the same tables that I see in the Hue Hive editor.  It seems to be pointing to creating a new metastore somewhere.  I have tried to include the hive-site.xml in various places but to no affect.  I've tried including it in the following locations:

 

  • The job xml for the spark action
  • The job xml for the workflow
  • A file tag in the workflow.xml for the spark action
  • etc.

 

I have run the code successfully many times in spark-shell.   I probably put it incorrectly in one of the locations. 

 

Any thoughts on what I am missing?

New Contributor
Posts: 3
Registered: ‎05-13-2016

Re: Oozie - Spark Action - Hive

I'm facing exactly the same issue. Trying to run a Spark job that is using a HiveContext from an Oozie Spark action results in the job failing to connect to the Hive metastore. I also tried adding the hive-site.xml in the various mentioned places to no avail. So where would be the right place to configure the Oozie Spark action to play nicely with Hive?
Explorer
Posts: 19
Registered: ‎02-11-2016

Re: Oozie - Spark Action - Hive

I'm off to other areas for the time being due to timeboxing. I will need to figure this out eventually.  I did find a few other posts on the web from others having the same problem.  But, I did not find solutions.  Good luck.   

 

If/when I get back to this and figure it out.  I'll post a solution.

New Contributor
Posts: 3
Registered: ‎05-13-2016

Re: Oozie - Spark Action - Hive

I found a solution, even though it is not the prettiest.

 

In the Spark job, before creating the SparkContext, you need to set a system property for the Hive metastore URI like so:

 

System.setProperty("hive.metastore.uris", "thrift://<your metastore host>:9083");

I have tried setting this through the Oozie configuration but to no avail. So far, this was the only way to make it work. 

Explorer
Posts: 19
Registered: ‎02-11-2016

Re: Oozie - Spark Action - Hive

Perfect! I added it and can see the right hive env now. It is pretty enough. Thanks!

Hopefully we can figure out how to do it through Oozie later. But, I'm happy...
New Contributor
Posts: 3
Registered: ‎05-13-2016

Re: Oozie - Spark Action - Hive

Found another way of achieving this which also works for PySpark in an Oozie Spark action.

 

Add this to the <spark-opts> tag in the action definition:

--conf spark.yarn.appMasterEnv.hive.metastore.uris=thrift://<your-hive-metastore>:9083

This will add the metastore URI to the application master environment and should allow successful connection to Hive for using tables inside a PySpark script.