- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Oozie - Spark Action - Hive
Created on 04-25-2016 10:54 AM - edited 09-16-2022 03:15 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm having problems getting a real hive context in a spark-scala application (jar) that is running as an Oozie spark action. The spark app does write to hdfs folders just fine. But it is unable to see the same tables that I see in the Hue Hive editor. It seems to be pointing to creating a new metastore somewhere. I have tried to include the hive-site.xml in various places but to no affect. I've tried including it in the following locations:
- The job xml for the spark action
- The job xml for the workflow
- A file tag in the workflow.xml for the spark action
- etc.
I have run the code successfully many times in spark-shell. I probably put it incorrectly in one of the locations.
Any thoughts on what I am missing?
Created 05-17-2016 12:58 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I found a solution, even though it is not the prettiest.
In the Spark job, before creating the SparkContext, you need to set a system property for the Hive metastore URI like so:
System.setProperty("hive.metastore.uris", "thrift://<your metastore host>:9083");
I have tried setting this through the Oozie configuration but to no avail. So far, this was the only way to make it work.
Created 05-13-2016 05:53 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Created 05-13-2016 05:56 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm off to other areas for the time being due to timeboxing. I will need to figure this out eventually. I did find a few other posts on the web from others having the same problem. But, I did not find solutions. Good luck.
If/when I get back to this and figure it out. I'll post a solution.
Created 05-17-2016 12:58 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I found a solution, even though it is not the prettiest.
In the Spark job, before creating the SparkContext, you need to set a system property for the Hive metastore URI like so:
System.setProperty("hive.metastore.uris", "thrift://<your metastore host>:9083");
I have tried setting this through the Oozie configuration but to no avail. So far, this was the only way to make it work.
Created 05-17-2016 10:32 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hopefully we can figure out how to do it through Oozie later. But, I'm happy...
Created 12-20-2016 06:02 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Found another way of achieving this which also works for PySpark in an Oozie Spark action.
Add this to the <spark-opts> tag in the action definition:
--conf spark.yarn.appMasterEnv.hive.metastore.uris=thrift://<your-hive-metastore>:9083
This will add the metastore URI to the application master environment and should allow successful connection to Hive for using tables inside a PySpark script.
