Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Passing hbase-site.xml to spark jobs using spark.driver.extraLibraryPath

avatar
Explorer

I am running Cloudera Express 5.12.1 with CDH-5.12.1.

I need to read/write data from HBase from my Spark jobs.

I set thefollowing settings:

 

spark.driver.extraLibraryPath=/etc/hbase/conf/hbase-site.xml
spark.executor.extraLibraryPath=/etc/hbase/conf/hbase-site.xml

In CM "Spark Client Advanced Configuration Snippet (Safety Valve) for spark-conf/spark-defaults.conf." in the Spark Service.  I can see my values being written in /etc/spark/conf/spark-defaults.conf  correctly.

Now I submit a Spark job (part of Oozie workflow).  I expect that the hbase config is picked up from the class path, but it is not.  I can see the correct setting in the Spark Histroy Server for the job:

 

spark.driver.extraLibraryPath	/etc/hbase/conf/hbase-site.xml:/var/cloudera/parcels/CDH-5.12.1-1.cdh5.12.1.p0.3/lib/hadoop/lib/native

but the connection to the Zookeeper is not honoring the host that is specified in the hbase.zookeeper.quorum setting from the hbase-site.xml file, but is trying to connect to localhost.

 

I wish I can specify the hbase-site.xml globally, and not for each of the Spark jobs, as I have many, so my intention is to confgure this globally.   What am I doing wrong?  What is the best practice to follow in this case?

 

Thanks

1 REPLY 1

avatar
Expert Contributor

Without reviewing the logs, If I can guess on the issue, If you are submitting the Spark job from Oozie Shell action, then I would suspect the problem is with Oozie's behavior of setting the environment variable HADOOP_CONF_DIR behind the scene. There is an internal Jira that tracks this behavior/work. The KB [1] explains a bit on this (even though it is reported for the hive-site.xml, I think it may influence the HBase client conf as well).

 

Try working around the problem by following the instructions on the KB [1] and see if it helps.

 

Thanks

Lingesh

[1]: https://community.cloudera.com/t5/Customer/How-to-run-a-Spark2-job-using-Oozie-Shell-Action-which/ta...