This project demonstrates how to build, deploy and run a Spark Scala app that runs in a Kerberized cluster. There are scripts to create and populate an HBase table and to run the test. The README has more details on how to run it.
hbase conf dir should be on SPARK_CLASSPATH
This approach does not work for long-running jobs (it needs to complete before the Kerberos token expires)
The example illustrates the use of the HBase InputFormat for obtaining an RDD. It also demonstrates using the HBase API for a Get operation for a row key.
The repo includes a maven project that will build a tar that contains a jar and scripts to help run the test in your cluster
The script (run_example.sh) uses 2 executors to demonstrate that the Kerberos token gets sent to the executors