Community Articles
Find and share helpful community-sourced technical articles
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.
Labels (2)

The Apache Spark - Apache HBase Connector (SHC) is a library to support Spark accessing HBase table as external data source or sink. It provides high performance HBase access via SparkSQL and DataFrames. SHC implements the standard Spark data source APIs, and leverages the Spark catalyst engine for query optimization. It bridges the gap between the simple HBase Key Value store and complex relational SQL queries and enables users to perform complex data analytics on top of HBase using Spark.

With the data frame support, SHC leverages all the optimization techniques in catalyst, and achieves data locality, partition pruning, predicate pushdown, Scanning and BulkGet, etc. For the detailed information, please refer the README in SHC github, which is kept up-to-date.

Don't have an account?
Coming from Hortonworks? Activate your account here
Version history
Revision #:
1 of 1
Last update:
‎01-01-2017 08:24 AM
Updated by:
Top Kudoed Authors