I just installed CDH 5.7.0 onto our test cluster to see what capabilities the new HBase Spark Module can give us. I noticed that /opt/cloudera/parcels/CDH/jars/hbase-spark-1.2.0-cdh5.7.0.jar exists but is not in the classpath for Spark. I was wondering why it isn't by default. Is it because we have HBase on a separate cluster outside of the Spark cluster?
I added the hbase-spark-1.2.0-cdh5.7.1.jar to the classpath.txt in /etc/spark/conf. This allowed me to use it with spark, but now I cannot find the HBaseTableCatalog class. Does anyone know where I can import it from?
you should not have to copy any jars to make use of the hbase-spark module. you definitely should not copy them into /etc/spark/conf.
How is your cluster deployed? i.e. are you using CM or not?
I installed CDH 5.7.1 just recently into production using CM 5.8.1. So, are you saying that I can use the hbase-spark module in spark shell without any inclusion of the jar in the classpath or in the arguments?
This is the command I issued on a client node for Spark and HBase to include the hbase-spark module in the Spark classpath.
echo "/opt/cloudera/parcels/CDH/jars/hbase-spark-1.2.0-cdh5.7.1.jar" >> /etc/spark/conf/classpath.txt
2. I am using parcels to deploy CDH.
I didn't alter any of the base spark configs (/etc/spark/conf/spark-default.conf, /etc/spark/conf/spark-env.sh). What I meant is that I only touched the hbase configs (/etc/hbase/conf/hbase-site.xml, /etc/hbase/conf/hbase-env.sh) to sync them with the hbase settings on the hbase cluster.
We have spark in production and use it for almost everything. This is the only area where we would like to switch over to the hbase-spark module and get away from using the hbase java client in most cases and nerdhammer's hbase connector wherever possible. This would be a good way to unify the hbase related spark code.
Yes, I agree. I do not want to do this either. But, I just want to convey the steps to the other spark developers on how to begin using it without having to do the alteration. If this is still a work in progress by Cloudera to fully integrate this into their CDH parcels, I would like to know this too, so I can tell them to wait for a future release.