Created 01-17-2021 04:51 AM
I am new to Spark and wanted to read/write data to/from HBase tables. I encountered an error while reading from HBase tables.
Versions: Spark: 2.4.7; HBase: 1.4.13; Scala: 2.11.12
Command:
spark-shell --jars /usr/lib/hbase/shc/core/target/shc-core-1.1.3-2.4-s_2.11.jar,/usr/lib/hbase/lib/htrace-core4-4.1.0-incubating.jar,/usr/lib/hbase/hbase-client-2.4.0.jar,/usr/lib/hbase/hbase-common-2.4.0.jar,/usr/lib/hbase/hbase-server-2.4.0.jar,/usr/lib/hbase/hbase-protocol-2.4.0.jar,/usr/lib/hbase/lib/htrace-core4-4.1.0-incubating.jar,/usr/lib/hbase/hbase-shaded-miscellaneous-2.2.1.jar,/usr/lib/hbase/hbase-protocol-shaded-2.4.0.jar
Error:
java.io.IOException: java.lang.reflect.UndeclaredThrowableException
at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:232)
at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:128)
at org.apache.spark.sql.execution.datasources.hbase.HBaseConnectionCache$$anonfun$getConnection$1.apply(HBaseConnectionCache.scala:144)
at org.apache.spark.sql.execution.datasources.hbase.HBaseConnectionCache$$anonfun$getConnection$1.apply(HBaseConnectionCache.scala:144)
at org.apache.spark.sql.execution.datasources.hbase.HBaseConnectionCache$$anonfun$1.apply(HBaseConnectionCache.scala:135)
at org.apache.spark.sql.execution.datasources.hbase.HBaseConnectionCache$$anonfun$1.apply(HBaseConnectionCache.scala:133)
at scala.collection.mutable.HashMap.getOrElseUpdate(HashMap.scala:79)
at org.apache.spark.sql.execution.datasources.hbase.HBaseConnectionCache$.getConnection(HBaseConnectionCache.scala:133)
at org.apache.spark.sql.execution.datasources.hbase.HBaseConnectionCache$.getConnection(HBaseConnectionCache.scala:144)
at org.apache.spark.sql.execution.datasources.hbase.RegionResource.init(HBaseResources.scala:96)
at org.apache.spark.sql.execution.datasources.hbase.ReferencedResource$class.liftedTree1$1(HBaseResources.scala:60)
at org.apache.spark.sql.execution.datasources.hbase.ReferencedResource$class.acquire(HBaseResources.scala:57)
at org.apache.spark.sql.execution.datasources.hbase.RegionResource.acquire(HBaseResources.scala:91)
at org.apache.spark.sql.execution.datasources.hbase.ReferencedResource$class.releaseOnException(HBaseResources.scala:77)
at org.apache.spark.sql.execution.datasources.hbase.RegionResource.releaseOnException(HBaseResources.scala:91)
at org.apache.spark.sql.execution.datasources.hbase.RegionResource.<init>(HBaseResources.scala:111)
at org.apache.spark.sql.execution.datasources.hbase.HBaseTableScanRDD.getPartitions(HBaseTableScan.scala:66)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:273)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:269)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:269)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:273)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:269)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:269)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:273)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:269)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:269)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:273)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:269)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:269)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:273)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:269)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:269)
at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:384)
at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38)
at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collectFromPlan(Dataset.scala:3416)
at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2553)
at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2553)
at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3391)
at org.apache.spark.sql.execution.SQLExecution$.org$apache$spark$sql$execution$SQLExecution$$executeQuery$1(SQLExecution.scala:83)
at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1$$anonfun$apply$1.apply(SQLExecution.scala:94)
at org.apache.spark.sql.execution.QueryExecutionMetrics$.withMetrics(QueryExecutionMetrics.scala:141)
at org.apache.spark.sql.execution.SQLExecution$.org$apache$spark$sql$execution$SQLExecution$$withMetrics(SQLExecution.scala:178)
at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:93)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:200)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:92)
at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$withAction(Dataset.scala:3390)
at org.apache.spark.sql.Dataset.head(Dataset.scala:2553)
at org.apache.spark.sql.Dataset.take(Dataset.scala:2767)
at org.apache.spark.sql.Dataset.getRows(Dataset.scala:256)
at org.apache.spark.sql.Dataset.showString(Dataset.scala:293)
at org.apache.spark.sql.Dataset.show(Dataset.scala:754)
at org.apache.spark.sql.Dataset.show(Dataset.scala:713)
at org.apache.spark.sql.Dataset.show(Dataset.scala:722)
... 55 elided
Caused by: java.lang.reflect.UndeclaredThrowableException: java.lang.reflect.InvocationTargetException: java.lang.NoClassDefFoundError: org/apache/hbase/thirdparty/com/google/protobuf/RpcController
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1944)
at org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:347)
at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:228)
... 116 more
Caused by: java.lang.reflect.InvocationTargetException: java.lang.NoClassDefFoundError: org/apache/hbase/thirdparty/com/google/protobuf/RpcController
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.hbase.client.ConnectionFactory.lambda$createConnection$0(ConnectionFactory.java:230)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1926)
... 118 more
Caused by: java.lang.NoClassDefFoundError: org/apache/hbase/thirdparty/com/google/protobuf/RpcController
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:756)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
at org.apache.hadoop.hbase.client.ConnectionImplementation.<init>(ConnectionImplementation.java:286)
... 126 more
Caused by: java.lang.ClassNotFoundException: org.apache.hbase.thirdparty.com.google.protobuf.RpcController
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 138 more
Please guide on how to resolve this and be able to read/write data to/from HBase using Spark.
Created 05-16-2023 02:08 AM
Hi @Paarth
Spark HBase Connector (SHC) is not supported in CDP. You need to use HBase Spark Connector to access the HBase data using Spark.
You can find the sample reference:
Created 01-20-2021 04:59 AM
@Paarth You might have to adjust jars. Check one similar issue here: https://community.cloudera.com/t5/Support-Questions/java-lang-ClassNotFoundException-org-apache-hbas...
Created 05-16-2023 02:08 AM
Hi @Paarth
Spark HBase Connector (SHC) is not supported in CDP. You need to use HBase Spark Connector to access the HBase data using Spark.
You can find the sample reference: