Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Zeppelin 0.8 spark2 interpeter in yarn-cluster mode not working with Hive

Zeppelin 0.8 spark2 interpeter in yarn-cluster mode not working with Hive

New Contributor

Hi all,

When trying out spark2 interpreter in yarn-cluster mode the integration with Hive cease to work.

Anyone experienced this?

Simple code:

import org.apache.spark.sql.Row
import org.apache.spark.sql.SparkSession

val hiveSession = SparkSession.builder().appName("Spark Hive Example").enableHiveSupport().getOrCreate()
val hiveResult = sql("SELECT * FROM SOME_DB.some_table LIMIT 10") 
hiveResult.show()   

Result:

import org.apache.spark.sql.Row
import org.apache.spark.sql.SparkSession
hiveSession: org.apache.spark.sql.SparkSession = org.apache.spark.sql.SparkSession@22774caa
org.apache.spark.sql.AnalysisException: Table or view not found: `SOME_DB`.`some_table`; line 1 pos 14;
'GlobalLimit 10
+- 'LocalLimit 10
   +- 'Project [*]
      +- 'UnresolvedRelation `SOME_DB`.`some_table`

  at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
  at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:82)
  at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:78)
  at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127)
  at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:126)
  at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:126)
  at scala.collection.immutable.List.foreach(List.scala:381)
  at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:126)
  at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:126)
  at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:126)
  at scala.collection.immutable.List.foreach(List.scala:381)
  at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:126)
  at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:126)
  at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:126)
  at scala.collection.immutable.List.foreach(List.scala:381)
  at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:126)
  at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:78)
  at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:91)
  at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:52)
  at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:67)
  at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:637)
  ... 52 elided

Configuration of the interpreter:

SPARK_HOME	/usr/hdp/current/spark2-client/	
master	yarn-cluster
spark.app.name	Zeppelin-Spark2-Cluster
spark.cores.max	4
spark.executor.memory	1g
spark.yarn.queue	UserQ
zeppelin.R.cmd	R
zeppelin.R.image.width	100%
zeppelin.R.knitr	true
zeppelin.R.render.options	out.format = 'html', comment = NA, echo = FALSE, results = 'asis', message = F, warning = F, fig.retina = 2
zeppelin.dep.additionalRemoteRepository	spark-packages,http://dl.bintray.com/spark-packages/maven,false;
zeppelin.dep.localrepo	local-repo
zeppelin.pyspark.python	python
zeppelin.pyspark.useIPython	true
zeppelin.spark.concurrentSQL	false
zeppelin.spark.enableSupportedVersionCheck	true
zeppelin.spark.importImplicit	true
zeppelin.spark.maxResult	1000
zeppelin.spark.printREPLOutput	true
zeppelin.spark.sql.interpolation	false
zeppelin.spark.sql.stacktrace	true
zeppelin.spark.uiWebUrl	
zeppelin.spark.useHiveContext	true
zeppelin.spark.useNew	true
3 REPLIES 3

Re: Zeppelin 0.8 spark2 interpeter in yarn-cluster mode not working with Hive

@Alexandre Juma, Does it work in yarn-client mode? do add hive-site.xml using spark.files to the interpreter configs.

Highlighted

Re: Zeppelin 0.8 spark2 interpeter in yarn-cluster mode not working with Hive

@Alexandre Juma

Zeppelin spark interpreter does not support yarn-cluster mode. Only supports yarn-client mode. If you like to run in cluster you need to use livy interpreter.

HTH

*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

Re: Zeppelin 0.8 spark2 interpeter in yarn-cluster mode not working with Hive

New Contributor

@snemuri, yes. If i simply change to

master=yarn-client 

the code works perfectly.

I already tried to add to the interpreter config:

HADOOP_CONF_DIR=/etc/hadoop/conf/
spark.yarn.dist.files=/etc/hive/conf/hive-site.xml
spark.files=/etc/hive/conf/hive-site.xml

@falbani, I'm trying it because the release notes for Apache Zeppelin 0.8.0 says it now supports yarn in yarn-cluster mode.

https://zeppelin.apache.org/docs/0.8.0/interpreter/spark.html#2-set-master-in-interpreter-menu

Thanks for the tips

Don't have an account?
Coming from Hortonworks? Activate your account here