question Re: Java Spark Program and Hive table backed by HBase table in Archives of Support Questions (Read Only)

Java Spark Program and Hive table backed by HBase table

shigeru_takehar — Thu, 01 Dec 2016 08:53:52 GMT

I have a Hive table that is integrated with HBase table. It works fine on Hive command line to see data; however, when I try to do the same in Spark Java code where create a dataframe object by select statement and call show method, I see the following message forever:

16/11/30 19:40:31 INFO ClientCnxn: Session establishment complete on server localhost/0:0:0:0:0:0:0:1:2181, sessionid = 0x15802d56675006a, negotiated timeout = 90000

16/11/30 19:40:31 INFO RegionSizeCalculator: Calculating region sizes for table "st_tbl_1".

16/11/30 19:41:19 INFO RpcRetryingCaller: Call exception, tries=10, retries=35, started=48332 ms ago, cancelled=false, msg=

16/11/30 19:41:40 INFO RpcRetryingCaller: Call exception, tries=11, retries=35, started=68473 ms ago, cancelled=false, msg=

16/11/30 19:42:00 INFO RpcRetryingCaller: Call exception, tries=12, retries=35, started=88545 ms ago, cancelled=false, msg=

16/11/30 19:42:20 INFO RpcRetryingCaller: Call exception, tries=13, retries=35, started=108742 ms ago, cancelled=false, msg=

Re: Java Spark Program and Hive table backed by HBase table

bmathew — Thu, 01 Dec 2016 10:50:50 GMT

I typically don't recommend using Hive atop HBase. The performance is terrible when you start getting into high data volumes. You could create your HBase tables and then use Spark to access data programmatically using the Data Frames API, and use Phoenix to create a view atop HBase for SQL analytics. Phoenix is orders of magnitude faster than Hive for SQL on HBase. Try it out. It's easy to use. You use Phoenix to create a schema for the HBase table.

Hortonworks released a Spark on HBase connector that you can use. It's a DataFrame based connector:

http://hortonworks.com/blog/spark-hbase-dataframe-based-hbase-connector/

Re: Java Spark Program and Hive table backed by HBase table

shigeru_takehar — Thu, 01 Dec 2016 13:19:32 GMT

Thank you for the recommendation, but I would like to solve this issue first.

We are using hdp 2.5. Previously, we used hdp 2.3 where I could not run Spark with Phoenix. Can hdp 2.5 allow us to use Phoenix in Spark 1.6.2?

Re: Java Spark Program and Hive table backed by HBase table

shigeru_takehar — Thu, 08 Dec 2016 08:33:25 GMT

The solution was:

The Spark provide a sample HBase test program in /usr/hdp/current/spark-client/examples/src/main/scala/org/apache/spark/examples.

The program name is HBaseTest.scala. If you open this file, you will see the comment:

// please ensure HBASE_CONF_DIR is on classpath of spark driver

// e.g: set it through spark.driver.extraClassPath property

// in spark-defaults.conf or through --driver-class-path

// command line option of spark-submit

So, I added that parameter and my command line becomes as follows:

spark-submit --jars hive-hbase-handler.jar,hbase-client.jar,hbase-common.jar,hbase-hadoop-compact.jar,hbase-hadoop2-compact.jar,hbase-protocol.jar,hbase-server.jar,metrics-core.jar,guava.jar --driver-class-path postgresql.jar --master yarn-client --files /usr/hdp/current/hbase-client/conf/hbase-site.xml --class SparkJS --driver-class-path /etc/hbase/2.5.0.0-1245/0 spark-js-1.jar

The issue is gone and I can do what I need to do.

Re: Java Spark Program and Hive table backed by HBase table

bikas — Fri, 09 Dec 2016 10:47:20 GMT

To confirm, the issue is that hbase conf was not available to spark. You can also check the Spark HBase Connector we support at https://github.com/hortonworks-spark/shc. It has many features but also documents the configuration for Spark Hbase access and security aspects too.