Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

HBase Spark Module work with DataFrames

Highlighted

HBase Spark Module work with DataFrames

Rising Star

Has there been any advances in the HBase Spark module included with CDH? So far, I see that it works with RDD's in a very difficult manner. I was wondering if DataFrames support is coming or is already there somewhere? Working with DataFrames would make reading and writing data to HBase far much easier and speedier to code than using RDD's.

 

Thanks,

Ben

2 REPLIES 2

Re: HBase Spark Module work with DataFrames

Expert Contributor

It looks like read support was added with this Jira: https://issues.apache.org/jira/browse/HBASE-14181 which is available since CDH 5.7.  Write support is still a work in progress: https://issues.apache.org/jira/browse/HBASE-15336

Re: HBase Spark Module work with DataFrames

Explorer

I wouldn't recommend this in production environments where performance is important.

 

But you can create a hive external table on top of an HBase table, and use Spark JDBC to create a dataframe on top of the Hive table via Impala.

 

SparkSQL/Spark JDBC (selects and inserts) works and Impala selects and even Inserts (and updates via Inserts) works as well against the Hive external table.