01-02-2016 08:03 AM
I'm wondering what the status of this project is. It's github page shows very little to no activity in the last 12 months. Pull requests and issues seem to be ignored.
On the other hand, this seems to be the only project which succesfully allows bulk writing to hbase in a secure cluster using spark streaming. With code and dependency changes I had it working with cdh5.4(hbase 1.0 and spark 1.3). Sadly since a recent upgrade to cdh5.5(spark 1.5.0) it seems to have stopped working. The logs are not showing anything, executors just don't report any logs or progress. (while collecting in driver and writing from there does work)
Anybody has an alternative approach or hints as to why this project would fail with the change from spark 1.3 to 1.5?
Solved! Go to Solution.
01-02-2016 08:42 AM
Yes it was merged into Hbase 2.0.0(https://issues.apache.org/jira/browse/HBASE-13992).
As far as i can tell 2.0 seems a very long way off and I'm not seeing any backports. And even with this merged into hbase, the core issue here is having a reusable connection object on every executor with the correct kerberos principle along with it.
It seems realy wierd to me that this labs project is the only thing in Spark that allows connecting to Hbase on a secure cluster. I know of SPARK-6918, but there has to be a better (less hacky) way.
01-08-2016 01:45 PM
01-11-2016 08:45 AM
Passing this along from the Labs team:
SparkOnHBase has made it into upstream HBase on the 2.x line. Any patches ideally should go to the Apache HBase community mailing lists  and jira .
The engineering team is currently working on backporting and hardening SparkOnHBase for a future C5.x HBase release.
Currently incubating in Cloudera Labs:Envelope