Reply
Highlighted
Explorer
Posts: 22
Registered: ‎01-30-2014
Accepted Solution

SparkOnHbase status?

I'm wondering what the status of this project is. It's github page shows very little to no activity in the last 12 months. Pull requests and issues seem to be ignored.

 

On the other hand, this seems to be the only project which succesfully allows bulk writing to hbase in a secure cluster using spark streaming. With code and dependency changes I had it working with cdh5.4(hbase 1.0 and spark 1.3). Sadly since a recent upgrade to cdh5.5(spark 1.5.0) it seems to have stopped working. The logs are not showing anything, executors just don't report any logs or progress. (while collecting in driver and writing from there does work)

 

Anybody has an alternative approach or hints as to why this project would fail with the change from spark 1.3 to 1.5?

 

 

Cloudera Employee
Posts: 366
Registered: ‎07-29-2013

Re: SparkOnHbase status?

AFAIK it was merged into the HBase project itself, so development
continues there.
Explorer
Posts: 22
Registered: ‎01-30-2014

Re: SparkOnHbase status?

Yes it was merged into Hbase 2.0.0(https://issues.apache.org/jira/browse/HBASE-13992).

 

As far as i can tell 2.0 seems a very long way off and I'm not seeing any backports. And even with this merged into hbase, the core issue here is having a reusable connection object on every executor with the correct kerberos principle along with it. 

 

It seems realy wierd to me that this labs project is the only thing in Spark that allows connecting to Hbase on a secure cluster. I know of SPARK-6918, but there has to be a better (less hacky) way.

Explorer
Posts: 22
Registered: ‎01-30-2014

Re: SparkOnHbase status?

For future reference: There's actually a small bug in SparkOnHbase which prevents it from running with Spark1.5.0. I've submitted a pull request, which addresses this issue.

 

My own fork of the SparkOnHbase is fully updated for cdh5.5 if anybody needs it: https://github.com/rverk/SparkOnHBase

Posts: 354
Topics: 162
Kudos: 60
Solutions: 27
Registered: ‎06-26-2013

Re: SparkOnHbase status?

Passing this along from the Labs team:

 

SparkOnHBase has made it into upstream HBase on the 2.x line. Any patches ideally should go to the Apache HBase community mailing lists [1] and jira [2].  

 

The engineering team is currently working on backporting and hardening SparkOnHBase for a future C5.x HBase release. 

 

[1] http://hbase.apache.org/mail-lists.html

[2] https://issues.apache.org/jira/browse/HBASE

Announcements

Currently incubating in Cloudera Labs:

Envelope
HTrace
Ibis
Impyla
Livy
Oryx
Phoenix
Spark Runner for Beam SDK
Time Series for Spark
YCSB