Support Questions

DEHallman · ‎04-25-2016

I'm having problems getting a real hive context in a spark-scala application (jar) that is running as an Oozie spark action. The spark app does write to hdfs folders just fine. But it is unable to see the same tables that I see in the Hue Hive editor. It seems to be pointing to creating a new metastore somewhere. I have tried to include the hive-site.xml in various places but to no affect. I've tried including it in the following locations:

The job xml for the spark action
The job xml for the workflow
A file tag in the workflow.xml for the spark action
etc.

I have run the code successfully many times in spark-shell. I probably put it incorrectly in one of the locations.

Any thoughts on what I am missing?

akasper · ‎05-17-2016

I found a solution, even though it is not the prettiest.

In the Spark job, before creating the SparkContext, you need to set a system property for the Hive metastore URI like so:

System.setProperty("hive.metastore.uris", "thrift://<your metastore host>:9083");

I have tried setting this through the Oozie configuration but to no avail. So far, this was the only way to make it work.

View solution in original post

akasper · ‎05-13-2016

I'm facing exactly the same issue. Trying to run a Spark job that is using a HiveContext from an Oozie Spark action results in the job failing to connect to the Hive metastore. I also tried adding the hive-site.xml in the various mentioned places to no avail. So where would be the right place to configure the Oozie Spark action to play nicely with Hive?

DEHallman · ‎05-13-2016

I'm off to other areas for the time being due to timeboxing. I will need to figure this out eventually. I did find a few other posts on the web from others having the same problem. But, I did not find solutions. Good luck.

If/when I get back to this and figure it out. I'll post a solution.

akasper · ‎05-17-2016

I found a solution, even though it is not the prettiest.

In the Spark job, before creating the SparkContext, you need to set a system property for the Hive metastore URI like so:

System.setProperty("hive.metastore.uris", "thrift://<your metastore host>:9083");

I have tried setting this through the Oozie configuration but to no avail. So far, this was the only way to make it work.

DEHallman · ‎05-17-2016

Perfect! I added it and can see the right hive env now. It is pretty enough. Thanks!

Hopefully we can figure out how to do it through Oozie later. But, I'm happy...

akasper · ‎12-20-2016

Found another way of achieving this which also works for PySpark in an Oozie Spark action.

Add this to the <spark-opts> tag in the action definition:

--conf spark.yarn.appMasterEnv.hive.metastore.uris=thrift://<your-hive-metastore>:9083

This will add the metastore URI to the application master environment and should allow successful connection to Hive for using tables inside a PySpark script.

Cloudera Community

Support Questions

Oozie - Spark Action - Hive