Support Questions
Find answers, ask questions, and share your expertise

HBase Spark Module work with DataFrames

Rising Star

Has there been any advances in the HBase Spark module included with CDH? So far, I see that it works with RDD's in a very difficult manner. I was wondering if DataFrames support is coming or is already there somewhere? Working with DataFrames would make reading and writing data to HBase far much easier and speedier to code than using RDD's.





Re: HBase Spark Module work with DataFrames

Expert Contributor

It looks like read support was added with this Jira: which is available since CDH 5.7.  Write support is still a work in progress:

Re: HBase Spark Module work with DataFrames


I wouldn't recommend this in production environments where performance is important.


But you can create a hive external table on top of an HBase table, and use Spark JDBC to create a dataframe on top of the Hive table via Impala.


SparkSQL/Spark JDBC (selects and inserts) works and Impala selects and even Inserts (and updates via Inserts) works as well against the Hive external table.