Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.
Labels (2)
Explorer

The Apache Spark - Apache HBase Connector (SHC) is a library to support Spark accessing HBase table as external data source or sink. It provides high performance HBase access via SparkSQL and DataFrames. SHC implements the standard Spark data source APIs, and leverages the Spark catalyst engine for query optimization. It bridges the gap between the simple HBase Key Value store and complex relational SQL queries and enables users to perform complex data analytics on top of HBase using Spark.

With the data frame support, SHC leverages all the optimization techniques in catalyst, and achieves data locality, partition pruning, predicate pushdown, Scanning and BulkGet, etc. For the detailed information, please refer the README in SHC github, which is kept up-to-date.

1,845 Views
Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.
Version history
Last update:
‎01-01-2017 08:24 AM
Updated by:
Contributors
Top Kudoed Authors