Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Oozie - Spark Action - Hive

avatar
Contributor

I'm having problems getting a real hive context in a spark-scala application (jar) that is running as an Oozie spark action.   The spark app does write to hdfs folders just fine.  But it is unable to see the same tables that I see in the Hue Hive editor.  It seems to be pointing to creating a new metastore somewhere.  I have tried to include the hive-site.xml in various places but to no affect.  I've tried including it in the following locations:

 

  • The job xml for the spark action
  • The job xml for the workflow
  • A file tag in the workflow.xml for the spark action
  • etc.

 

I have run the code successfully many times in spark-shell.   I probably put it incorrectly in one of the locations. 

 

Any thoughts on what I am missing?

1 ACCEPTED SOLUTION

avatar
New Contributor

I found a solution, even though it is not the prettiest.

 

In the Spark job, before creating the SparkContext, you need to set a system property for the Hive metastore URI like so:

 

System.setProperty("hive.metastore.uris", "thrift://<your metastore host>:9083");

I have tried setting this through the Oozie configuration but to no avail. So far, this was the only way to make it work. 

View solution in original post

5 REPLIES 5

avatar
New Contributor
I'm facing exactly the same issue. Trying to run a Spark job that is using a HiveContext from an Oozie Spark action results in the job failing to connect to the Hive metastore. I also tried adding the hive-site.xml in the various mentioned places to no avail. So where would be the right place to configure the Oozie Spark action to play nicely with Hive?

avatar
Contributor

I'm off to other areas for the time being due to timeboxing. I will need to figure this out eventually.  I did find a few other posts on the web from others having the same problem.  But, I did not find solutions.  Good luck.   

 

If/when I get back to this and figure it out.  I'll post a solution.

avatar
New Contributor

I found a solution, even though it is not the prettiest.

 

In the Spark job, before creating the SparkContext, you need to set a system property for the Hive metastore URI like so:

 

System.setProperty("hive.metastore.uris", "thrift://<your metastore host>:9083");

I have tried setting this through the Oozie configuration but to no avail. So far, this was the only way to make it work. 

avatar
Contributor
Perfect! I added it and can see the right hive env now. It is pretty enough. Thanks!

Hopefully we can figure out how to do it through Oozie later. But, I'm happy...

avatar
New Contributor

Found another way of achieving this which also works for PySpark in an Oozie Spark action.

 

Add this to the <spark-opts> tag in the action definition:

--conf spark.yarn.appMasterEnv.hive.metastore.uris=thrift://<your-hive-metastore>:9083

This will add the metastore URI to the application master environment and should allow successful connection to Hive for using tables inside a PySpark script.