Created 12-01-2016 12:53 AM
I have a Hive table that is integrated with HBase table. It works fine on Hive command line to see data; however, when I try to do the same in Spark Java code where create a dataframe object by select statement and call show method, I see the following message forever:
16/11/30 19:40:31 INFO ClientCnxn: Session establishment complete on server localhost/0:0:0:0:0:0:0:1:2181, sessionid = 0x15802d56675006a, negotiated timeout = 90000
16/11/30 19:40:31 INFO RegionSizeCalculator: Calculating region sizes for table "st_tbl_1".
16/11/30 19:41:19 INFO RpcRetryingCaller: Call exception, tries=10, retries=35, started=48332 ms ago, cancelled=false, msg=
16/11/30 19:41:40 INFO RpcRetryingCaller: Call exception, tries=11, retries=35, started=68473 ms ago, cancelled=false, msg=
16/11/30 19:42:00 INFO RpcRetryingCaller: Call exception, tries=12, retries=35, started=88545 ms ago, cancelled=false, msg=
16/11/30 19:42:20 INFO RpcRetryingCaller: Call exception, tries=13, retries=35, started=108742 ms ago, cancelled=false, msg=
Created 12-08-2016 12:33 AM
The solution was:
The Spark provide a sample HBase test program in /usr/hdp/current/spark-client/examples/src/main/scala/org/apache/spark/examples.
The program name is HBaseTest.scala. If you open this file, you will see the comment:
// please ensure HBASE_CONF_DIR is on classpath of spark driver
// e.g: set it through spark.driver.extraClassPath property
// in spark-defaults.conf or through --driver-class-path
// command line option of spark-submit
So, I added that parameter and my command line becomes as follows:
spark-submit --jars hive-hbase-handler.jar,hbase-client.jar,hbase-common.jar,hbase-hadoop-compact.jar,hbase-hadoop2-compact.jar,hbase-protocol.jar,hbase-server.jar,metrics-core.jar,guava.jar --driver-class-path postgresql.jar --master yarn-client --files /usr/hdp/current/hbase-client/conf/hbase-site.xml --class SparkJS --driver-class-path /etc/hbase/2.5.0.0-1245/0 spark-js-1.jar
The issue is gone and I can do what I need to do.
Created 12-01-2016 02:50 AM
I typically don't recommend using Hive atop HBase. The performance is terrible when you start getting into high data volumes. You could create your HBase tables and then use Spark to access data programmatically using the Data Frames API, and use Phoenix to create a view atop HBase for SQL analytics. Phoenix is orders of magnitude faster than Hive for SQL on HBase. Try it out. It's easy to use. You use Phoenix to create a schema for the HBase table.
Hortonworks released a Spark on HBase connector that you can use. It's a DataFrame based connector:
http://hortonworks.com/blog/spark-hbase-dataframe-based-hbase-connector/
Created 12-01-2016 05:19 AM
Thank you for the recommendation, but I would like to solve this issue first.
We are using hdp 2.5. Previously, we used hdp 2.3 where I could not run Spark with Phoenix. Can hdp 2.5 allow us to use Phoenix in Spark 1.6.2?
Created 12-08-2016 12:33 AM
The solution was:
The Spark provide a sample HBase test program in /usr/hdp/current/spark-client/examples/src/main/scala/org/apache/spark/examples.
The program name is HBaseTest.scala. If you open this file, you will see the comment:
// please ensure HBASE_CONF_DIR is on classpath of spark driver
// e.g: set it through spark.driver.extraClassPath property
// in spark-defaults.conf or through --driver-class-path
// command line option of spark-submit
So, I added that parameter and my command line becomes as follows:
spark-submit --jars hive-hbase-handler.jar,hbase-client.jar,hbase-common.jar,hbase-hadoop-compact.jar,hbase-hadoop2-compact.jar,hbase-protocol.jar,hbase-server.jar,metrics-core.jar,guava.jar --driver-class-path postgresql.jar --master yarn-client --files /usr/hdp/current/hbase-client/conf/hbase-site.xml --class SparkJS --driver-class-path /etc/hbase/2.5.0.0-1245/0 spark-js-1.jar
The issue is gone and I can do what I need to do.
Created 12-09-2016 02:47 AM
To confirm, the issue is that hbase conf was not available to spark. You can also check the Spark HBase Connector we support at https://github.com/hortonworks-spark/shc. It has many features but also documents the configuration for Spark Hbase access and security aspects too.