Has there been any advances in the HBase Spark module included with CDH? So far, I see that it works with RDD's in a very difficult manner. I was wondering if DataFrames support is coming or is already there somewhere? Working with DataFrames would make reading and writing data to HBase far much easier and speedier to code than using RDD's.
I wouldn't recommend this in production environments where performance is important.
But you can create a hive external table on top of an HBase table, and use Spark JDBC to create a dataframe on top of the Hive table via Impala.
SparkSQL/Spark JDBC (selects and inserts) works and Impala selects and even Inserts (and updates via Inserts) works as well against the Hive external table.