The following jars need to be in the CLASSPATH of the HBase region servers:
scala-library, hbase-spark, and hbase-spark-protocol-shaded.
The server-side configuration is needed for column filter pushdown
if you cannot perform the server-side configuration, consider using.option("hbase.spark.pushdown.columnfilter", false)
So the --jars option of the spark-submit does make the jars accessible to the spark driver and executors, but somehow when you make a qualifier filter operation, spark must be delegating some work to the hbase region servers, and the jars need to be in the classpath of the region servers's Java processes too ?
Nah I figure it out. First, go to /etc/spark/conf.cloudera.spark_on_yarn/classpath.txt then delete the last line (which contains the path to hbase-class.jar). Then you download hbase-spark-22.214.171.124.2.15.0-147.jar, then when you run spark-shell, add --jars pathToYourDownloadedjar, then you add option("hbase.spark.pushdown.columnfilter", false) before load data like this: