Support Questions
Find answers, ask questions, and share your expertise

Problems with Oozie Spark Action - Unable to read HBase Table via Hbase Handler

Rising Star

Hi, I have a Cloudera Express 6.3.2 setup where I'm trying to submit an Oozie Spark Action that reads from an HBase Table (correctly mapped in Hive)

 

If I get into Pyspark CLI , adding the relevant "--jars" when calling it, I can read from the Table without any problems

 

The list of Jars I'm adding is the following:

 

hive-hbase-handler-2.1.1-cdh6.3.2.jar
hbase-client-2.1.0-cdh6.3.2.jar
guava-11.0.2.jar
hbase-common-2.1.0-cdh6.3.2.jar
hbase-hadoop-compat-2.1.0-cdh6.3.2.jar
hbase-hadoop2-compat-2.1.0-cdh6.3.2.jar
hbase-protocol-2.1.0-cdh6.3.2.jar
hbase-server-2.1.0-cdh6.3.2.jar
htrace-core4-4.2.0-incubating.jar

 

 

But if I try to run the same script via Oozie + Spark Action, I get the following exception:

 

Schermata 2020-10-29 alle 16.21.11.png

 

Now, the same exact thing is working on a CDH 5 setup, but nevertheless I've tried several things hoping to make it work here too, but to no avail.

 

What I've tried:

------------------

 

- Adding a "hive.aux.jars.path" section to BOTH my "hive-site.xml" and "hbase-site.xml" that I'm passing to my Spark Action woth "--files":

Schermata 2020-10-29 alle 16.32.34.png

- Configuring "Oozie Sharelib" and putting the Jar Files under BOTH the "spark" and "hive" HDFS directories relevant to the Sharelib itself (and of course setting "oozie.use.system.libpath" to "true" in my Oozie Workflow Configuration)

 

- Also tried to set a "oozie.libpath" option, same way as the previous step, pointing to a complete list of the involved HBase Jars (then removed this config as I was getting an error telling me that it was not possible loading the relevant Jars multiple times, as they are already present in the Sharelib)

 

- Passing a relevant "SPARK_CLASSPATH" along with a "SPARK_HOME" as "oozie.launcher.yarn.app.mapreduce.am.env" Hadoop Properties in my Oozie Workflow Configuration, with the list of all the Jars involved

 

- Tried to add a "--conf "spark.driver.extraClassPath=..." and a "--conf "spark.executor.extraClassPath=..." configuration options, BOTH inside the Python script bering called and in my Oozie Workflow Spark Action window in the GUI in Hue

 

- I've extensively searched the Community Forum and the web before posting this, but no joy. Also, this is the first time this is happening, I have other setups where this works as expected!

 

Don't know what to try anymore. Any help would be greatly appreciated. Here below I'm posting the full stack trace.

 

Thank you for any insights!

 

 

 

Log Type: stdout

Log Upload Time: Thu Oct 29 15:36:22 +0100 2020

Log Length: 9563

Traceback (most recent call last):
  File "reportServicesCreditExtraction.py", line 74, in <module>
    hbase_utenti_DF = sqlContext.table("msgnet.hbase_utenti")
  File "/data/1/yarn/nm/usercache/msgnet/appcache/application_1602174532153_0765/container_1602174532153_0765_02_000001/pyspark.zip/pyspark/sql/context.py", line 371, in table
  File "/data/1/yarn/nm/usercache/msgnet/appcache/application_1602174532153_0765/container_1602174532153_0765_02_000001/pyspark.zip/pyspark/sql/session.py", line 791, in table
  File "/data/1/yarn/nm/usercache/msgnet/appcache/application_1602174532153_0765/container_1602174532153_0765_02_000001/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
  File "/data/1/yarn/nm/usercache/msgnet/appcache/application_1602174532153_0765/container_1602174532153_0765_02_000001/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
  File "/data/1/yarn/nm/usercache/msgnet/appcache/application_1602174532153_0765/container_1602174532153_0765_02_000001/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o86.table.
: java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/mapreduce/TableInputFormatBase
	at java.lang.ClassLoader.defineClass1(Native Method)
	at java.lang.ClassLoader.defineClass(ClassLoader.java:756)
	at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
	at java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
	at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
	at org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1.doLoadClass(IsolatedClientLoader.scala:246)
	at org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1.loadClass(IsolatedClientLoader.scala:235)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
	at org.apache.hadoop.hive.hbase.HBaseStorageHandler.getInputFormatClass(HBaseStorageHandler.java:133)
	at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$7$$anonfun$12$$anonfun$apply$10.apply(HiveClientImpl.scala:463)
	at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$7$$anonfun$12$$anonfun$apply$10.apply(HiveClientImpl.scala:463)
	at scala.Option.map(Option.scala:146)
	at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$7$$anonfun$12.apply(HiveClientImpl.scala:463)
	at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$7$$anonfun$12.apply(HiveClientImpl.scala:463)
	at scala.Option.orElse(Option.scala:289)
	at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$7.apply(HiveClientImpl.scala:462)
	at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$7.apply(HiveClientImpl.scala:376)
	at scala.Option.map(Option.scala:146)
	at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1.apply(HiveClientImpl.scala:376)
	at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1.apply(HiveClientImpl.scala:374)
	at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:283)
	at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:221)
	at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:220)
	at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:266)
	at org.apache.spark.sql.hive.client.HiveClientImpl.getTableOption(HiveClientImpl.scala:374)
	at org.apache.spark.sql.hive.client.HiveClient$class.getTable(HiveClient.scala:81)
	at org.apache.spark.sql.hive.client.HiveClientImpl.getTable(HiveClientImpl.scala:84)
	at org.apache.spark.sql.hive.HiveExternalCatalog.getRawTable(HiveExternalCatalog.scala:120)
	at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:737)
	at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:737)
	at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:99)
	at org.apache.spark.sql.hive.HiveExternalCatalog.getTable(HiveExternalCatalog.scala:736)
	at org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.getTable(ExternalCatalogWithListener.scala:146)
	at org.apache.spark.sql.catalyst.catalog.SessionCatalog.lookupRelation(SessionCatalog.scala:701)
	at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveRelations$$lookupTableFromCatalog(Analyzer.scala:730)
	at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.resolveRelation(Analyzer.scala:685)
	at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:715)
	at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:708)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1$$anonfun$apply$1.apply(AnalysisHelper.scala:90)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1$$anonfun$apply$1.apply(AnalysisHelper.scala:90)
	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1.apply(AnalysisHelper.scala:89)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1.apply(AnalysisHelper.scala:86)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:194)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$class.resolveOperatorsUp(AnalysisHelper.scala:86)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUp(LogicalPlan.scala:29)
	at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:708)
	at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:654)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:87)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:84)
	at scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:124)
	at scala.collection.immutable.List.foldLeft(List.scala:84)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:84)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:76)
	at scala.collection.immutable.List.foreach(List.scala:392)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:76)
	at org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$executeSameContext(Analyzer.scala:127)
	at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:121)
	at org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$executeAndCheck$1.apply(Analyzer.scala:106)
	at org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$executeAndCheck$1.apply(Analyzer.scala:105)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:201)
	at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:105)
	at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:57)
	at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:55)
	at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:47)
	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:78)
	at org.apache.spark.sql.SparkSession.table(SparkSession.scala:637)
	at org.apache.spark.sql.SparkSession.table(SparkSession.scala:633)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.mapreduce.TableInputFormatBase
	at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
	at org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1.doLoadClass(IsolatedClientLoader.scala:255)
	at org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1.loadClass(IsolatedClientLoader.scala:235)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
	... 84 more

15:36:20.966 [Driver] ERROR org.apache.spark.deploy.yarn.ApplicationMaster - User application exited with status 1