The Apache Spark - Apache HBase Connector (SHC) is a library to support Spark accessing HBase table as external data source or sink. It provides high performance HBase access via SparkSQL and DataFrames. SHC implements the standard Spark data source APIs, and leverages the Spark catalyst engine for query optimization. It bridges the gap between the simple HBase Key Value store and complex relational SQL queries and enables users to perform complex data analytics on top of HBase using Spark.
With the data frame support, SHC leverages all the optimization techniques in catalyst, and achieves data locality, partition pruning, predicate pushdown, Scanning and BulkGet, etc. For the detailed information, please refer the README in SHC github, which is kept up-to-date.