Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Reading from and Writing to HBase with a spark DataFrame

avatar
Super Collaborator

Hello,

I am recently tasked to work out something that can read data from HBase into a Spark DataFrame and also once the transformation / enrichment is done write the DataFrame back into HBase.

What is the best way of doing this? I can see from Cloudera there is sparkOnHBase package (but I think they have given the code to HBase, and the maven modules are with version 0.0.x-clabs-SNAPSHOT which doesnt sound assuring..). There is also a HBase-Spark module on apache HBase but it seems that it is not released yet.

Ideally it would be something similar to these:

// using spark-csv from databricks
DataFrame csvDF = sqlContext.read()
        .format("csv")
        .options(options)
        .load(hdfs.getURI("hdfs://sandbox:8020"));

// using spark-solr from lucidworks
DataFrame solrDF = sqlContext.read()
        .format("solr")
        .options(options)
        .load();

Is there something similar to these in the HBase world?

I have also seen this thread with the experimental connector but I would really prefer something more mature.

Thanks in advance!

1 ACCEPTED SOLUTION

avatar

Hi @David Tam, for a working example using phoenix-spark to read/write HBase DataFrames, checkout https://github.com/randerzander/HiveToPhoenix

View solution in original post

5 REPLIES 5

avatar
Master Mentor

@David Tam

right now the only definite answer is https://phoenix.apache.org/phoenix_spark.html

HBase-Spark is not released yet and it's coming very soon, no timeline was announced yet.

avatar
Master Mentor

avatar
Master Mentor

@David Tam Amazing to see all the jira on the same topic https://issues.apache.org/jira/browse/HBASE-14181

Link

avatar

Hi @David Tam, for a working example using phoenix-spark to read/write HBase DataFrames, checkout https://github.com/randerzander/HiveToPhoenix

avatar
Super Collaborator

Thanks all for the input. The phoenix-spark example looks very close to what we need but I am not sure if people in my team would be happy with phoenix but I will bring this up and see. Meanwhile I think I will also follow the HBase jira and hope that it will be out soon.

Thank you!