Cloudera Labs
Provide feedback on Cloudera Labs
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

SparkOnHbase status?

Solved Go to solution

SparkOnHbase status?

Contributor

I'm wondering what the status of this project is. It's github page shows very little to no activity in the last 12 months. Pull requests and issues seem to be ignored.

 

On the other hand, this seems to be the only project which succesfully allows bulk writing to hbase in a secure cluster using spark streaming. With code and dependency changes I had it working with cdh5.4(hbase 1.0 and spark 1.3). Sadly since a recent upgrade to cdh5.5(spark 1.5.0) it seems to have stopped working. The logs are not showing anything, executors just don't report any logs or progress. (while collecting in driver and writing from there does work)

 

Anybody has an alternative approach or hints as to why this project would fail with the change from spark 1.3 to 1.5?

 

 

1 ACCEPTED SOLUTION

Accepted Solutions

Re: SparkOnHbase status?

Contributor

For future reference: There's actually a small bug in SparkOnHbase which prevents it from running with Spark1.5.0. I've submitted a pull request, which addresses this issue.

 

My own fork of the SparkOnHbase is fully updated for cdh5.5 if anybody needs it: https://github.com/rverk/SparkOnHBase

4 REPLIES 4

Re: SparkOnHbase status?

Master Collaborator
AFAIK it was merged into the HBase project itself, so development
continues there.

Re: SparkOnHbase status?

Contributor

Yes it was merged into Hbase 2.0.0(https://issues.apache.org/jira/browse/HBASE-13992).

 

As far as i can tell 2.0 seems a very long way off and I'm not seeing any backports. And even with this merged into hbase, the core issue here is having a reusable connection object on every executor with the correct kerberos principle along with it. 

 

It seems realy wierd to me that this labs project is the only thing in Spark that allows connecting to Hbase on a secure cluster. I know of SPARK-6918, but there has to be a better (less hacky) way.

Re: SparkOnHbase status?

Contributor

For future reference: There's actually a small bug in SparkOnHbase which prevents it from running with Spark1.5.0. I've submitted a pull request, which addresses this issue.

 

My own fork of the SparkOnHbase is fully updated for cdh5.5 if anybody needs it: https://github.com/rverk/SparkOnHBase

Re: SparkOnHbase status?

Master Collaborator

Passing this along from the Labs team:

 

SparkOnHBase has made it into upstream HBase on the 2.x line. Any patches ideally should go to the Apache HBase community mailing lists [1] and jira [2].  

 

The engineering team is currently working on backporting and hardening SparkOnHBase for a future C5.x HBase release. 

 

[1] http://hbase.apache.org/mail-lists.html

[2] https://issues.apache.org/jira/browse/HBASE