Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Hive-on-Spark cannot read HBase tables in CDH 6.2.0

Highlighted

Hive-on-Spark cannot read HBase tables in CDH 6.2.0

New Contributor

Hello,

 

While attempting to upgrade from CDH 5.14.0 to CDH 6.2.0, I've been testing some Hive-on-Spark queries that run over HBase tables. In doing so, I've run into this odd ClassNotFoundException:

 

ERROR : Spark job[-1] failed
java.io.IOException: java.lang.reflect.InvocationTargetException
	at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:221) ~[hbase-client-2.1.0-cdh6.2.0.jar:?]
	at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:114) ~[hbase-client-2.1.0-cdh6.2.0.jar:?]
	at org.apache.hadoop.hbase.mapred.TableMapReduceUtil.initCredentials(TableMapReduceUtil.java:307) ~[hbase-mapreduce-2.1.0-cdh6.2.0.jar:?]
	at org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat.getSplitsInternal(HiveHBaseTableInputFormat.java:310) ~[hive-hbase-handler-2.1.1-cdh6.2.0.jar:2.1.1-cdh6.2.0]
	at org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat.getSplits(HiveHBaseTableInputFormat.java:302) ~[hive-hbase-handler-2.1.1-cdh6.2.0.jar:2.1.1-cdh6.2.0]
	at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:349) ~[hive-exec-2.1.1-cdh6.2.0.jar:2.1.1-cdh6.2.0]
	at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:468) ~[hive-exec-2.1.1-cdh6.2.0.jar:2.1.1-cdh6.2.0]
	at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getCombineSplits(CombineHiveInputFormat.java:364) ~[hive-exec-2.1.1-cdh6.2.0.jar:2.1.1-cdh6.2.0]
	at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:554) ~[hive-exec-2.1.1-cdh6.2.0.jar:2.1.1-cdh6.2.0]
	at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:205) ~[spark-core_2.11-2.4.0-cdh6.2.0.jar:2.4.0-cdh6.2.0]
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) ~[spark-core_2.11-2.4.0-cdh6.2.0.jar:2.4.0-cdh6.2.0]
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) ~[spark-core_2.11-2.4.0-cdh6.2.0.jar:2.4.0-cdh6.2.0]
	at scala.Option.getOrElse(Option.scala:121) ~[scala-library-2.11.12.jar:?]
	at org.apache.spark.rdd.RDD.partitions(RDD.scala:251) ~[spark-core_2.11-2.4.0-cdh6.2.0.jar:2.4.0-cdh6.2.0]
	at org.apache.spark.rdd.RDD.getNumPartitions(RDD.scala:267) ~[spark-core_2.11-2.4.0-cdh6.2.0.jar:2.4.0-cdh6.2.0]
	at org.apache.spark.api.java.JavaRDDLike$class.getNumPartitions(JavaRDDLike.scala:65) ~[spark-core_2.11-2.4.0-cdh6.2.0.jar:2.4.0-cdh6.2.0]
	at org.apache.spark.api.java.AbstractJavaRDDLike.getNumPartitions(JavaRDDLike.scala:45) ~[spark-core_2.11-2.4.0-cdh6.2.0.jar:2.4.0-cdh6.2.0]
	at org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generateMapInput(SparkPlanGenerator.java:252) ~[hive-exec-2.1.1-cdh6.2.0.jar:2.1.1-cdh6.2.0]
	at org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generateParentTran(SparkPlanGenerator.java:179) ~[hive-exec-2.1.1-cdh6.2.0.jar:2.1.1-cdh6.2.0]
	at org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generate(SparkPlanGenerator.java:130) ~[hive-exec-2.1.1-cdh6.2.0.jar:2.1.1-cdh6.2.0]
	at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient$JobStatusJob.call(RemoteHiveSparkClient.java:355) ~[hive-exec-2.1.1-cdh6.2.0.jar:2.1.1-cdh6.2.0]
	at org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:400) ~[hive-exec-2.1.1-cdh6.2.0.jar:2.1.1-cdh6.2.0]
	at org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:365) ~[hive-exec-2.1.1-cdh6.2.0.jar:2.1.1-cdh6.2.0]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_191]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_191]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_191]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_191]
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_191]
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_191]
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_191]
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_191]
	at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:219) ~[hbase-client-2.1.0-cdh6.2.0.jar:?]
	... 26 more
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.hbase.client.ClusterStatusListener$MulticastListener not found
	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2537) ~[hadoop-common-3.0.0-cdh6.2.0.jar:?]
	at org.apache.hadoop.hbase.client.ConnectionImplementation.<init>(ConnectionImplementation.java:277) ~[hbase-client-2.1.0-cdh6.2.0.jar:?]
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_191]
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_191]
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_191]
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_191]
	at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:219) ~[hbase-client-2.1.0-cdh6.2.0.jar:?]
	... 26 more
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.hbase.client.ClusterStatusListener$MulticastListener not found
	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2505) ~[hadoop-common-3.0.0-cdh6.2.0.jar:?]
	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2529) ~[hadoop-common-3.0.0-cdh6.2.0.jar:?]
	at org.apache.hadoop.hbase.client.ConnectionImplementation.<init>(ConnectionImplementation.java:277) ~[hbase-client-2.1.0-cdh6.2.0.jar:?]
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_191]
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_191]
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_191]
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_191]
	at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:219) ~[hbase-client-2.1.0-cdh6.2.0.jar:?]
	... 26 more
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.hbase.client.ClusterStatusListener$MulticastListener not found
	at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2409) ~[hadoop-common-3.0.0-cdh6.2.0.jar:?]
	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2503) ~[hadoop-common-3.0.0-cdh6.2.0.jar:?]
	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2529) ~[hadoop-common-3.0.0-cdh6.2.0.jar:?]
	at org.apache.hadoop.hbase.client.ConnectionImplementation.<init>(ConnectionImplementation.java:277) ~[hbase-client-2.1.0-cdh6.2.0.jar:?]
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_191]
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_191]
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_191]
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_191]
	at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:219) ~[hbase-client-2.1.0-cdh6.2.0.jar:?]
	... 26 more
ERROR : FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. Spark job failed due to: Class org.apache.hadoop.hbase.client.ClusterStatusListener$MulticastListener not found

I call it "odd" because the missing class, org.apache.hadoop.hbase.client.ClusterStatusListener$MulticastListener, is present in hbase-client-2.1.0-cdh6.2.0.jar... and, as can be seen in the stack trace, so are a number of other classes that *were* found, such as org.apache.hadoop.hbase.client.ConnectionFactory. It seems very strange that these classes would be on the HiveServer2 instance's classpath, but org.apache.hadoop.hbase.client.ClusterStatusListener$MulticastListener would not. (And, indeed, based on the HiveServer2 instance's logs, hbase-client-2.1.0-cdh6.2.0.jar is, in fact, on the classpath.)

 

Also notable are the facts that 1) this only happens when running queries that spawn a job (so "select * from <hbase_table> limit 10" works just fine) and 2) this only happens when using spark as hive's execution engine; switch the execution engine back to mapreduce, and the queries run without any problems.

 

Any ideas as to what could be causing this?

Don't have an account?
Coming from Hortonworks? Activate your account here