Reply
Expert Contributor
Posts: 61
Registered: ‎02-03-2016

CDH 5.7.0 HBase Spark Module

I just installed CDH 5.7.0 onto our test cluster to see what capabilities the new HBase Spark Module can give us. I noticed that /opt/cloudera/parcels/CDH/jars/hbase-spark-1.2.0-cdh5.7.0.jar exists but is not in the classpath for Spark. I was wondering why it isn't by default. Is it because we have HBase on a separate cluster outside of the Spark cluster?

 

Thanks,

Ben

Expert Contributor
Posts: 61
Registered: ‎02-03-2016

Re: CDH 5.7.0 HBase Spark Module

I added the hbase-spark-1.2.0-cdh5.7.1.jar to the classpath.txt in /etc/spark/conf. This allowed me to use it with spark, but now I cannot find the HBaseTableCatalog class. Does anyone know where I can import it from?

 

Thanks,

Ben

Cloudera Employee
Posts: 88
Registered: ‎01-08-2014

Re: CDH 5.7.0 HBase Spark Module

you should not have to copy any jars to make use of the hbase-spark module. you definitely should not copy them into /etc/spark/conf.

 

How is your cluster deployed? i.e. are you using CM or not?

Expert Contributor
Posts: 61
Registered: ‎02-03-2016

Re: CDH 5.7.0 HBase Spark Module

Hi Busbey,

 

I installed CDH 5.7.1 just recently into production using CM 5.8.1. So, are you saying that I can use the hbase-spark module in spark shell without any inclusion of the jar in the classpath or in the arguments?

 

Thanks,

Ben

Cloudera Employee
Posts: 88
Registered: ‎01-08-2014

Re: CDH 5.7.0 HBase Spark Module

You may need to add jars to the classpath, but you shouldn't do that
by copying jars anywhere.

Do you already have an HBase GATEWAY role deployed on each node that
will have run a spark executor?

Are you using a parcels deployment or system packages?
Expert Contributor
Posts: 61
Registered: ‎02-03-2016

Re: CDH 5.7.0 HBase Spark Module

This is the command I issued on a client node for Spark and HBase to include the hbase-spark module in the Spark classpath.

 

 

echo "/opt/cloudera/parcels/CDH/jars/hbase-spark-1.2.0-cdh5.7.1.jar" >> /etc/spark/conf/classpath.txt

 

  1. No, I don't the HBase gateway role on every node of the Spark cluster. We have a separate HBase cluster. Do I need to copy the HBase config files to every node of the Spark cluster? I just did this for the Spark client node.

2. I am using parcels to deploy CDH.

 

Cloudera Employee
Posts: 88
Registered: ‎01-08-2014

Re: CDH 5.7.0 HBase Spark Module

It looks like the docs for the hbase-spark module in CDH5.7 state that
a hbase client config is only needed on the node that submits the
spark job, so that's a nice improvement. :)

http://www.cloudera.com/documentation/enterprise/5-7-x/topics/spark_integration.html

If your hbase cluster is separate, then yes you'll need to copy client
configs over.

You shouldn't manually alter the client configs CM places in
/etc/spark/conf, since it might rewrite them on a later deployment and
lose your changes.
Expert Contributor
Posts: 61
Registered: ‎02-03-2016

Re: CDH 5.7.0 HBase Spark Module

[ Edited ]

I didn't alter any of the base spark configs (/etc/spark/conf/spark-default.conf, /etc/spark/conf/spark-env.sh). What I meant is that I only touched the hbase configs (/etc/hbase/conf/hbase-site.xml, /etc/hbase/conf/hbase-env.sh) to sync them with the hbase settings on the hbase cluster.

 

We have spark in production and use it for almost everything. This is the only area where we would like to switch over to the hbase-spark module and get away from using the hbase java client in most cases and nerdhammer's hbase connector wherever possible. This would be a good way to unify the hbase related spark code.

Cloudera Employee
Posts: 88
Registered: ‎01-08-2014

Re: CDH 5.7.0 HBase Spark Module

Sure, but altering /etc/spark/conf/classpath.txt (or creating it) is
also altering the spark client configs.
Cloudera Employee
Posts: 88
Registered: ‎01-08-2014

Re: CDH 5.7.0 HBase Spark Module

Sure, but altering /etc/spark/conf/classpath.txt (or creating it) is
also altering the spark client configs.
Announcements