Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

CDH 5.7.0 HBase Spark Module

CDH 5.7.0 HBase Spark Module

Rising Star

I just installed CDH 5.7.0 onto our test cluster to see what capabilities the new HBase Spark Module can give us. I noticed that /opt/cloudera/parcels/CDH/jars/hbase-spark-1.2.0-cdh5.7.0.jar exists but is not in the classpath for Spark. I was wondering why it isn't by default. Is it because we have HBase on a separate cluster outside of the Spark cluster?

 

Thanks,

Ben

12 REPLIES 12

Re: CDH 5.7.0 HBase Spark Module

Rising Star

I added the hbase-spark-1.2.0-cdh5.7.1.jar to the classpath.txt in /etc/spark/conf. This allowed me to use it with spark, but now I cannot find the HBaseTableCatalog class. Does anyone know where I can import it from?

 

Thanks,

Ben

Re: CDH 5.7.0 HBase Spark Module

Expert Contributor

you should not have to copy any jars to make use of the hbase-spark module. you definitely should not copy them into /etc/spark/conf.

 

How is your cluster deployed? i.e. are you using CM or not?

Re: CDH 5.7.0 HBase Spark Module

Rising Star

Hi Busbey,

 

I installed CDH 5.7.1 just recently into production using CM 5.8.1. So, are you saying that I can use the hbase-spark module in spark shell without any inclusion of the jar in the classpath or in the arguments?

 

Thanks,

Ben

Re: CDH 5.7.0 HBase Spark Module

Expert Contributor
You may need to add jars to the classpath, but you shouldn't do that
by copying jars anywhere.

Do you already have an HBase GATEWAY role deployed on each node that
will have run a spark executor?

Are you using a parcels deployment or system packages?

Re: CDH 5.7.0 HBase Spark Module

Rising Star

This is the command I issued on a client node for Spark and HBase to include the hbase-spark module in the Spark classpath.

 

 

echo "/opt/cloudera/parcels/CDH/jars/hbase-spark-1.2.0-cdh5.7.1.jar" >> /etc/spark/conf/classpath.txt

 

  1. No, I don't the HBase gateway role on every node of the Spark cluster. We have a separate HBase cluster. Do I need to copy the HBase config files to every node of the Spark cluster? I just did this for the Spark client node.

2. I am using parcels to deploy CDH.

 

Re: CDH 5.7.0 HBase Spark Module

Expert Contributor
It looks like the docs for the hbase-spark module in CDH5.7 state that
a hbase client config is only needed on the node that submits the
spark job, so that's a nice improvement. :)

http://www.cloudera.com/documentation/enterprise/5-7-x/topics/spark_integration.html

If your hbase cluster is separate, then yes you'll need to copy client
configs over.

You shouldn't manually alter the client configs CM places in
/etc/spark/conf, since it might rewrite them on a later deployment and
lose your changes.

Re: CDH 5.7.0 HBase Spark Module

Rising Star

I didn't alter any of the base spark configs (/etc/spark/conf/spark-default.conf, /etc/spark/conf/spark-env.sh). What I meant is that I only touched the hbase configs (/etc/hbase/conf/hbase-site.xml, /etc/hbase/conf/hbase-env.sh) to sync them with the hbase settings on the hbase cluster.

 

We have spark in production and use it for almost everything. This is the only area where we would like to switch over to the hbase-spark module and get away from using the hbase java client in most cases and nerdhammer's hbase connector wherever possible. This would be a good way to unify the hbase related spark code.

Re: CDH 5.7.0 HBase Spark Module

Expert Contributor
Sure, but altering /etc/spark/conf/classpath.txt (or creating it) is
also altering the spark client configs.

Re: CDH 5.7.0 HBase Spark Module

Rising Star

Yes, I agree. I do not want to do this either. But, I just want to convey the steps to the other spark developers on how to begin using it without having to do the alteration. If this is still a work in progress by Cloudera to fully integrate this into their CDH parcels, I would like to know this too, so I can tell them to wait for a future release.