Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Spark Hbase connector for Spark 1.6.1 version

avatar
New Member

We are planning to use Spark Hbase connector from HortonWorks for the new project.[https://github.com/hortonworks-spark/shc]

Since we are using HortonWorks 2.4.2, the supported Spark Version is 1.6.1

Can we use this Spark-Hbase connector jar in Spark 1.6.1?

1 ACCEPTED SOLUTION

avatar
Super Guru

@Sankaraiah Narayanasamy

That is supported. I am sure that you researched on using this connector. This article that points out the use of Spark 1.6.1 is supported, but it works practically with any version of Spark since 1.2.: http://hortonworks.com/blog/spark-hbase-dataframe-based-hbase-connector/. The Github confirms the same. Look at the pom.xml: https://github.com/hortonworks-spark/shc/blob/master/pom.xml, properties section:

<properties>
<spark.version>1.6.1</spark.version>
<hbase.version>1.1.2</hbase.version>

Use the Spark-on-HBase connector as a standard Spark package.

+++

If the response was helpful, please vote and accept the best answer.

View solution in original post

6 REPLIES 6

avatar
Super Guru

@Sankaraiah Narayanasamy

That is supported. I am sure that you researched on using this connector. This article that points out the use of Spark 1.6.1 is supported, but it works practically with any version of Spark since 1.2.: http://hortonworks.com/blog/spark-hbase-dataframe-based-hbase-connector/. The Github confirms the same. Look at the pom.xml: https://github.com/hortonworks-spark/shc/blob/master/pom.xml, properties section:

<properties>
<spark.version>1.6.1</spark.version>
<hbase.version>1.1.2</hbase.version>

Use the Spark-on-HBase connector as a standard Spark package.

+++

If the response was helpful, please vote and accept the best answer.

avatar
New Member
@Constantin Stanca

: Can i use this as a Maven dependency? or i should use it as Standard Spark package? what is the difference? i never used Standard Spark package.

avatar
Super Guru

@Sankaraiah Narayanasamy

To include Spark-on-HBase connector as a standard Spark package, in your Spark application use:

spark-shell, pyspark, or spark-submit

> $SPARK_HOME/bin/spark-shell –packages zhzhan:shc:0.0.11-1.6.1-s_2.10

You can also include the package as the dependency in your SBT file as well. The format is the spark-package-name:version

spDependencies += “zhzhan/shc:0.0.11-1.6.1-s_2.10”

You can also use it as a Maven dependency.

All options are possible.

avatar
New Member

I have followed the above steps and using the JAR zhzhan/shc:0.0.11-1.6.1-s_2.10” .

On executing the code ,i am getting the following expection

Exception in thread "main" java.io.IOException: java.lang.reflect.InvocationTargetException at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:240) at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:218) at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:119) at org.apache.spark.sql.execution.datasources.hbase.RegionResource.init(HBaseResources.scala:93) at org.apache.spark.sql.execution.datasources.hbase.ReferencedResource$class.liftedTree1$1(HBaseResources.scala:57) at org.apache.spark.sql.execution.datasources.hbase.ReferencedResource$class.acquire(HBaseResources.scala:54) at org.apache.spark.sql.execution.datasources.hbase.RegionResource.acquire(HBaseResources.scala:88) at org.apache.spark.sql.execution.datasources.hbase.ReferencedResource$class.releaseOnException(HBaseResources.scala:74) at org.apache.spark.sql.execution.datasources.hbase.RegionResource.releaseOnException(HBaseResources.scala:88) at org.apache.spark.sql.execution.datasources.hbase.RegionResource.<init>(HBaseResources.scala:108) at org.apache.spark.sql.execution.datasources.hbase.HBaseTableScanRDD.getPartitions(HBaseTableScan.scala:60) at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:239) at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:237) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:239) at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:237) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:239) at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:237) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:190) at org.apache.spark.sql.execution.Limit.executeCollect(basicOperators.scala:165) at org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:174) at org.apache.spark.sql.DataFrame$anonfun$org$apache$spark$sql$DataFrame$execute$1$1.apply(DataFrame.scala:1538) at org.apache.spark.sql.DataFrame$anonfun$org$apache$spark$sql$DataFrame$execute$1$1.apply(DataFrame.scala:1538) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:56) at org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:2125) at org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$execute$1(DataFrame.scala:1537) at org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$collect(DataFrame.scala:1544) at org.apache.spark.sql.DataFrame$anonfun$head$1.apply(DataFrame.scala:1414) at org.apache.spark.sql.DataFrame$anonfun$head$1.apply(DataFrame.scala:1413) at org.apache.spark.sql.DataFrame.withCallback(DataFrame.scala:2138) at org.apache.spark.sql.DataFrame.head(DataFrame.scala:1413) at org.apache.spark.sql.DataFrame.take(DataFrame.scala:1495) at org.apache.spark.sql.DataFrame.showString(DataFrame.scala:171) at org.apache.spark.sql.DataFrame.show(DataFrame.scala:394) at org.apache.spark.sql.DataFrame.show(DataFrame.scala:355) at org.apache.spark.sql.DataFrame.show(DataFrame.scala:363) at com.sparhbaseintg.trnsfm.HBasesrc$.main(Hbasesrc.scala:83) at com.sparhbaseintg.trnsfm.HBasesrc.main(Hbasesrc.scala) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:238) ... 44 more Caused by: java.lang.NoSuchMethodError: org.apache.hadoop.hbase.client.RpcRetryingCallerFactory.instantiate(Lorg/apache/hadoop/conf/Configuration;Lorg/apache/hadoop/hbase/client/ServerStatisticTracker;)Lorg/apache/hadoop/hbase/client/RpcRetryingCallerFactory; at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.createAsyncProcess(ConnectionManager.java:2242) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.<init>(ConnectionManager.java:690) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.<init>(ConnectionManager.java:630)

I can see the methods org.apache.hadoop.hbase.client.RpcRetryingCallerFactory.instantiate method in hbase client jar. I am not sure why its not referring it.

Please help .

Thanks!

avatar
Master Guru

is hbase running? do you have a firewall blocking it?

what JDK are you using? perhaps an incompatible version? any other logs or details you can share?

avatar
New Member

Added hbase client JAR .Fixed the issue .

Thanks Timothy !