Created on 04-25-2016 10:54 AM - edited 09-16-2022 03:15 AM
I'm having problems getting a real hive context in a spark-scala application (jar) that is running as an Oozie spark action. The spark app does write to hdfs folders just fine. But it is unable to see the same tables that I see in the Hue Hive editor. It seems to be pointing to creating a new metastore somewhere. I have tried to include the hive-site.xml in various places but to no affect. I've tried including it in the following locations:
I have run the code successfully many times in spark-shell. I probably put it incorrectly in one of the locations.
Any thoughts on what I am missing?
Created 05-17-2016 12:58 AM
I found a solution, even though it is not the prettiest.
In the Spark job, before creating the SparkContext, you need to set a system property for the Hive metastore URI like so:
System.setProperty("hive.metastore.uris", "thrift://<your metastore host>:9083");
I have tried setting this through the Oozie configuration but to no avail. So far, this was the only way to make it work.
Created 05-13-2016 05:53 AM
Created 05-13-2016 05:56 AM
I'm off to other areas for the time being due to timeboxing. I will need to figure this out eventually. I did find a few other posts on the web from others having the same problem. But, I did not find solutions. Good luck.
If/when I get back to this and figure it out. I'll post a solution.
Created 05-17-2016 12:58 AM
I found a solution, even though it is not the prettiest.
In the Spark job, before creating the SparkContext, you need to set a system property for the Hive metastore URI like so:
System.setProperty("hive.metastore.uris", "thrift://<your metastore host>:9083");
I have tried setting this through the Oozie configuration but to no avail. So far, this was the only way to make it work.
Created 05-17-2016 10:32 AM
Created 12-20-2016 06:02 AM
Found another way of achieving this which also works for PySpark in an Oozie Spark action.
Add this to the <spark-opts> tag in the action definition:
--conf spark.yarn.appMasterEnv.hive.metastore.uris=thrift://<your-hive-metastore>:9083
This will add the metastore URI to the application master environment and should allow successful connection to Hive for using tables inside a PySpark script.