Created 10-17-2016 03:56 PM
We are planning to use Spark Hbase connector from HortonWorks for the new project.[https://github.com/hortonworks-spark/shc]
Since we are using HortonWorks 2.4.2, the supported Spark Version is 1.6.1
Can we use this Spark-Hbase connector jar in Spark 1.6.1?
Created 10-18-2016 03:06 AM
That is supported. I am sure that you researched on using this connector. This article that points out the use of Spark 1.6.1 is supported, but it works practically with any version of Spark since 1.2.: http://hortonworks.com/blog/spark-hbase-dataframe-based-hbase-connector/. The Github confirms the same. Look at the pom.xml: https://github.com/hortonworks-spark/shc/blob/master/pom.xml, properties section:
<properties> |
<spark.version>1.6.1</spark.version> |
<hbase.version>1.1.2</hbase.version> |
Use the Spark-on-HBase connector as a standard Spark package.
+++
If the response was helpful, please vote and accept the best answer.
Created 10-18-2016 03:06 AM
That is supported. I am sure that you researched on using this connector. This article that points out the use of Spark 1.6.1 is supported, but it works practically with any version of Spark since 1.2.: http://hortonworks.com/blog/spark-hbase-dataframe-based-hbase-connector/. The Github confirms the same. Look at the pom.xml: https://github.com/hortonworks-spark/shc/blob/master/pom.xml, properties section:
<properties> |
<spark.version>1.6.1</spark.version> |
<hbase.version>1.1.2</hbase.version> |
Use the Spark-on-HBase connector as a standard Spark package.
+++
If the response was helpful, please vote and accept the best answer.
Created 10-18-2016 05:33 AM
: Can i use this as a Maven dependency? or i should use it as Standard Spark package? what is the difference? i never used Standard Spark package.
Created 10-18-2016 02:15 PM
To include Spark-on-HBase connector as a standard Spark package, in your Spark application use:
spark-shell, pyspark, or spark-submit
> $SPARK_HOME/bin/spark-shell –packages zhzhan:shc:0.0.11-1.6.1-s_2.10
You can also include the package as the dependency in your SBT file as well. The format is the spark-package-name:version
spDependencies += “zhzhan/shc:0.0.11-1.6.1-s_2.10”
You can also use it as a Maven dependency.
All options are possible.
Created 10-24-2016 10:47 AM
I have followed the above steps and using the JAR zhzhan/shc:0.0.11-1.6.1-s_2.10” .
On executing the code ,i am getting the following expection
Exception in thread "main" java.io.IOException: java.lang.reflect.InvocationTargetException at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:240) at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:218) at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:119) at org.apache.spark.sql.execution.datasources.hbase.RegionResource.init(HBaseResources.scala:93) at org.apache.spark.sql.execution.datasources.hbase.ReferencedResource$class.liftedTree1$1(HBaseResources.scala:57) at org.apache.spark.sql.execution.datasources.hbase.ReferencedResource$class.acquire(HBaseResources.scala:54) at org.apache.spark.sql.execution.datasources.hbase.RegionResource.acquire(HBaseResources.scala:88) at org.apache.spark.sql.execution.datasources.hbase.ReferencedResource$class.releaseOnException(HBaseResources.scala:74) at org.apache.spark.sql.execution.datasources.hbase.RegionResource.releaseOnException(HBaseResources.scala:88) at org.apache.spark.sql.execution.datasources.hbase.RegionResource.<init>(HBaseResources.scala:108) at org.apache.spark.sql.execution.datasources.hbase.HBaseTableScanRDD.getPartitions(HBaseTableScan.scala:60) at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:239) at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:237) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:239) at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:237) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:239) at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:237) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:190) at org.apache.spark.sql.execution.Limit.executeCollect(basicOperators.scala:165) at org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:174) at org.apache.spark.sql.DataFrame$anonfun$org$apache$spark$sql$DataFrame$execute$1$1.apply(DataFrame.scala:1538) at org.apache.spark.sql.DataFrame$anonfun$org$apache$spark$sql$DataFrame$execute$1$1.apply(DataFrame.scala:1538) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:56) at org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:2125) at org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$execute$1(DataFrame.scala:1537) at org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$collect(DataFrame.scala:1544) at org.apache.spark.sql.DataFrame$anonfun$head$1.apply(DataFrame.scala:1414) at org.apache.spark.sql.DataFrame$anonfun$head$1.apply(DataFrame.scala:1413) at org.apache.spark.sql.DataFrame.withCallback(DataFrame.scala:2138) at org.apache.spark.sql.DataFrame.head(DataFrame.scala:1413) at org.apache.spark.sql.DataFrame.take(DataFrame.scala:1495) at org.apache.spark.sql.DataFrame.showString(DataFrame.scala:171) at org.apache.spark.sql.DataFrame.show(DataFrame.scala:394) at org.apache.spark.sql.DataFrame.show(DataFrame.scala:355) at org.apache.spark.sql.DataFrame.show(DataFrame.scala:363) at com.sparhbaseintg.trnsfm.HBasesrc$.main(Hbasesrc.scala:83) at com.sparhbaseintg.trnsfm.HBasesrc.main(Hbasesrc.scala) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:238) ... 44 more Caused by: java.lang.NoSuchMethodError: org.apache.hadoop.hbase.client.RpcRetryingCallerFactory.instantiate(Lorg/apache/hadoop/conf/Configuration;Lorg/apache/hadoop/hbase/client/ServerStatisticTracker;)Lorg/apache/hadoop/hbase/client/RpcRetryingCallerFactory; at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.createAsyncProcess(ConnectionManager.java:2242) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.<init>(ConnectionManager.java:690) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.<init>(ConnectionManager.java:630)
I can see the methods org.apache.hadoop.hbase.client.RpcRetryingCallerFactory.instantiate method in hbase client jar. I am not sure why its not referring it.
Please help .
Thanks!
Created 10-24-2016 11:35 AM
is hbase running? do you have a firewall blocking it?
what JDK are you using? perhaps an incompatible version? any other logs or details you can share?
Created 10-25-2016 03:09 PM
Added hbase client JAR .Fixed the issue .
Thanks Timothy !