Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Hive-on-Spark cannot read HBase tables in CDH 6.2.0

Highlighted

Hive-on-Spark cannot read HBase tables in CDH 6.2.0

New Contributor

Hello,

 

While attempting to upgrade from CDH 5.14.0 to CDH 6.2.0, I've been testing some Hive-on-Spark queries that run over HBase tables. In doing so, I've run into this odd ClassNotFoundException:

 

ERROR : Spark job[-1] failed
java.io.IOException: java.lang.reflect.InvocationTargetException
	at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:221) ~[hbase-client-2.1.0-cdh6.2.0.jar:?]
	at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:114) ~[hbase-client-2.1.0-cdh6.2.0.jar:?]
	at org.apache.hadoop.hbase.mapred.TableMapReduceUtil.initCredentials(TableMapReduceUtil.java:307) ~[hbase-mapreduce-2.1.0-cdh6.2.0.jar:?]
	at org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat.getSplitsInternal(HiveHBaseTableInputFormat.java:310) ~[hive-hbase-handler-2.1.1-cdh6.2.0.jar:2.1.1-cdh6.2.0]
	at org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat.getSplits(HiveHBaseTableInputFormat.java:302) ~[hive-hbase-handler-2.1.1-cdh6.2.0.jar:2.1.1-cdh6.2.0]
	at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:349) ~[hive-exec-2.1.1-cdh6.2.0.jar:2.1.1-cdh6.2.0]
	at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:468) ~[hive-exec-2.1.1-cdh6.2.0.jar:2.1.1-cdh6.2.0]
	at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getCombineSplits(CombineHiveInputFormat.java:364) ~[hive-exec-2.1.1-cdh6.2.0.jar:2.1.1-cdh6.2.0]
	at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:554) ~[hive-exec-2.1.1-cdh6.2.0.jar:2.1.1-cdh6.2.0]
	at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:205) ~[spark-core_2.11-2.4.0-cdh6.2.0.jar:2.4.0-cdh6.2.0]
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) ~[spark-core_2.11-2.4.0-cdh6.2.0.jar:2.4.0-cdh6.2.0]
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) ~[spark-core_2.11-2.4.0-cdh6.2.0.jar:2.4.0-cdh6.2.0]
	at scala.Option.getOrElse(Option.scala:121) ~[scala-library-2.11.12.jar:?]
	at org.apache.spark.rdd.RDD.partitions(RDD.scala:251) ~[spark-core_2.11-2.4.0-cdh6.2.0.jar:2.4.0-cdh6.2.0]
	at org.apache.spark.rdd.RDD.getNumPartitions(RDD.scala:267) ~[spark-core_2.11-2.4.0-cdh6.2.0.jar:2.4.0-cdh6.2.0]
	at org.apache.spark.api.java.JavaRDDLike$class.getNumPartitions(JavaRDDLike.scala:65) ~[spark-core_2.11-2.4.0-cdh6.2.0.jar:2.4.0-cdh6.2.0]
	at org.apache.spark.api.java.AbstractJavaRDDLike.getNumPartitions(JavaRDDLike.scala:45) ~[spark-core_2.11-2.4.0-cdh6.2.0.jar:2.4.0-cdh6.2.0]
	at org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generateMapInput(SparkPlanGenerator.java:252) ~[hive-exec-2.1.1-cdh6.2.0.jar:2.1.1-cdh6.2.0]
	at org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generateParentTran(SparkPlanGenerator.java:179) ~[hive-exec-2.1.1-cdh6.2.0.jar:2.1.1-cdh6.2.0]
	at org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generate(SparkPlanGenerator.java:130) ~[hive-exec-2.1.1-cdh6.2.0.jar:2.1.1-cdh6.2.0]
	at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient$JobStatusJob.call(RemoteHiveSparkClient.java:355) ~[hive-exec-2.1.1-cdh6.2.0.jar:2.1.1-cdh6.2.0]
	at org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:400) ~[hive-exec-2.1.1-cdh6.2.0.jar:2.1.1-cdh6.2.0]
	at org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:365) ~[hive-exec-2.1.1-cdh6.2.0.jar:2.1.1-cdh6.2.0]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_191]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_191]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_191]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_191]
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_191]
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_191]
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_191]
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_191]
	at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:219) ~[hbase-client-2.1.0-cdh6.2.0.jar:?]
	... 26 more
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.hbase.client.ClusterStatusListener$MulticastListener not found
	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2537) ~[hadoop-common-3.0.0-cdh6.2.0.jar:?]
	at org.apache.hadoop.hbase.client.ConnectionImplementation.<init>(ConnectionImplementation.java:277) ~[hbase-client-2.1.0-cdh6.2.0.jar:?]
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_191]
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_191]
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_191]
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_191]
	at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:219) ~[hbase-client-2.1.0-cdh6.2.0.jar:?]
	... 26 more
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.hbase.client.ClusterStatusListener$MulticastListener not found
	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2505) ~[hadoop-common-3.0.0-cdh6.2.0.jar:?]
	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2529) ~[hadoop-common-3.0.0-cdh6.2.0.jar:?]
	at org.apache.hadoop.hbase.client.ConnectionImplementation.<init>(ConnectionImplementation.java:277) ~[hbase-client-2.1.0-cdh6.2.0.jar:?]
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_191]
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_191]
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_191]
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_191]
	at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:219) ~[hbase-client-2.1.0-cdh6.2.0.jar:?]
	... 26 more
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.hbase.client.ClusterStatusListener$MulticastListener not found
	at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2409) ~[hadoop-common-3.0.0-cdh6.2.0.jar:?]
	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2503) ~[hadoop-common-3.0.0-cdh6.2.0.jar:?]
	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2529) ~[hadoop-common-3.0.0-cdh6.2.0.jar:?]
	at org.apache.hadoop.hbase.client.ConnectionImplementation.<init>(ConnectionImplementation.java:277) ~[hbase-client-2.1.0-cdh6.2.0.jar:?]
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_191]
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_191]
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_191]
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_191]
	at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:219) ~[hbase-client-2.1.0-cdh6.2.0.jar:?]
	... 26 more
ERROR : FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. Spark job failed due to: Class org.apache.hadoop.hbase.client.ClusterStatusListener$MulticastListener not found

I call it "odd" because the missing class, org.apache.hadoop.hbase.client.ClusterStatusListener$MulticastListener, is present in hbase-client-2.1.0-cdh6.2.0.jar... and, as can be seen in the stack trace, so are a number of other classes that *were* found, such as org.apache.hadoop.hbase.client.ConnectionFactory. It seems very strange that these classes would be on the HiveServer2 instance's classpath, but org.apache.hadoop.hbase.client.ClusterStatusListener$MulticastListener would not. (And, indeed, based on the HiveServer2 instance's logs, hbase-client-2.1.0-cdh6.2.0.jar is, in fact, on the classpath.)

 

Also notable are the facts that 1) this only happens when running queries that spawn a job (so "select * from <hbase_table> limit 10" works just fine) and 2) this only happens when using spark as hive's execution engine; switch the execution engine back to mapreduce, and the queries run without any problems.

 

Any ideas as to what could be causing this?