I am running Cloudera Express 5.12.1 with CDH-5.12.1.
I need to read/write data from HBase from my Spark jobs.
I set thefollowing settings:
spark.driver.extraLibraryPath=/etc/hbase/conf/hbase-site.xml
spark.executor.extraLibraryPath=/etc/hbase/conf/hbase-site.xml
In CM "Spark Client Advanced Configuration Snippet (Safety Valve) for spark-conf/spark-defaults.conf." in the Spark Service. I can see my values being written in /etc/spark/conf/spark-defaults.conf correctly.
Now I submit a Spark job (part of Oozie workflow). I expect that the hbase config is picked up from the class path, but it is not. I can see the correct setting in the Spark Histroy Server for the job:
spark.driver.extraLibraryPath /etc/hbase/conf/hbase-site.xml:/var/cloudera/parcels/CDH-5.12.1-1.cdh5.12.1.p0.3/lib/hadoop/lib/native
but the connection to the Zookeeper is not honoring the host that is specified in the hbase.zookeeper.quorum setting from the hbase-site.xml file, but is trying to connect to localhost.
I wish I can specify the hbase-site.xml globally, and not for each of the Spark jobs, as I have many, so my intention is to confgure this globally. What am I doing wrong? What is the best practice to follow in this case?
Thanks