Support Questions

Find answers, ask questions, and share your expertise

Reading from and Writing to HBase with a spark DataFrame

avatar
Super Collaborator

Hello,

I am recently tasked to work out something that can read data from HBase into a Spark DataFrame and also once the transformation / enrichment is done write the DataFrame back into HBase.

What is the best way of doing this? I can see from Cloudera there is sparkOnHBase package (but I think they have given the code to HBase, and the maven modules are with version 0.0.x-clabs-SNAPSHOT which doesnt sound assuring..). There is also a HBase-Spark module on apache HBase but it seems that it is not released yet.

Ideally it would be something similar to these:

// using spark-csv from databricks
DataFrame csvDF = sqlContext.read()
        .format("csv")
        .options(options)
        .load(hdfs.getURI("hdfs://sandbox:8020"));

// using spark-solr from lucidworks
DataFrame solrDF = sqlContext.read()
        .format("solr")
        .options(options)
        .load();

Is there something similar to these in the HBase world?

I have also seen this thread with the experimental connector but I would really prefer something more mature.

Thanks in advance!

1 ACCEPTED SOLUTION

avatar

Hi @David Tam, for a working example using phoenix-spark to read/write HBase DataFrames, checkout https://github.com/randerzander/HiveToPhoenix

View solution in original post

5 REPLIES 5

avatar
Master Mentor

@David Tam

right now the only definite answer is https://phoenix.apache.org/phoenix_spark.html

HBase-Spark is not released yet and it's coming very soon, no timeline was announced yet.

avatar
Master Mentor

avatar
Master Mentor

@David Tam Amazing to see all the jira on the same topic https://issues.apache.org/jira/browse/HBASE-14181

Link

avatar

Hi @David Tam, for a working example using phoenix-spark to read/write HBase DataFrames, checkout https://github.com/randerzander/HiveToPhoenix

avatar
Super Collaborator

Thanks all for the input. The phoenix-spark example looks very close to what we need but I am not sure if people in my team would be happy with phoenix but I will bring this up and see. Meanwhile I think I will also follow the HBase jira and hope that it will be out soon.

Thank you!