Created 12-18-2017 08:56 AM
Hi,
I'm working in Zeppelin with Spark interpreter. everytime I try to show df it's hanging or don't recognize the table. Here's my code:
%spark
import sqlContext.implicits._
import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.sql.SQLContext
import org.apache.spark.{SparkContext, SparkConf}
val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
val jdbcTable1 = "V_EQUIPMENT"
val equipment = sqlContext.read.format("jdbc").options(Map("url" -> jdbcUrl,"dbtable" -> jdbcTable1, "driver" -> jdbcDriver)).load()
val df = sqlContext.sql("select * from equipment")
df.show()
equipment.show()
org.apache.spark.sql.AnalysisException: Table not found: equipment; line 1 pos 14 at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.getTable(Analyzer.scala:305) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$anonfun$apply$9.applyOrElse(Analyzer.scala:314) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$anonfun$apply$9.applyOrElse(Analyzer.scala:309) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$anonfun$resolveOperators$1.apply(LogicalPlan.scala:57) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$anonfun$resolveOperators$1.apply(LogicalPlan.scala:57) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:56) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$anonfun$1.apply(LogicalPlan.scala:54) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$anonfun$1.apply(LogicalPlan.scala:54) at org.apache.spark.sql.catalyst.trees.TreeNode$anonfun$4.apply(TreeNode.scala:281) at scala.collection.Iterator$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) at scala.collection.AbstractIterator.to(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:321) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:54) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:309) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:299) at org.apache.spark.sql.catalyst.rules.RuleExecutor$anonfun$execute$1$anonfun$apply$1.apply(RuleExecutor.scala:83) at org.apache.spark.sql.catalyst.rules.RuleExecutor$anonfun$execute$1$anonfun$apply$1.apply(RuleExecutor.scala:80) at scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:111) at scala.collection.immutable.List.foldLeft(List.scala:84) at org.apache.spark.sql.catalyst.rules.RuleExecutor$anonfun$execute$1.apply(RuleExecutor.scala:80) at org.apache.spark.sql.catalyst.rules.RuleExecutor$anonfun$execute$1.apply(RuleExecutor.scala:72) at scala.collection.immutable.List.foreach(List.scala:318) at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:72) at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:36) at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:36) at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:34) at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:133) at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:52) at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:817) at $iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC.<init>(<console>:67) at $iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC.<init>(<console>:72) at $iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC.<init>(<console>:74) at $iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC.<init>(<console>:76) at $iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC.<init>(<console>:78) at $iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC.<init>(<console>:80) at $iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC.<init>(<console>:82) at $iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC.<init>(<console>:84) at $iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC.<init>(<console>:86) at $iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC.<init>(<console>:88) at $iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC.<init>(<console>:90) at $iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC.<init>(<console>:92) at $iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC.<init>(<console>:94) at $iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC.<init>(<console>:96) at $iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC.<init>(<console>:98) at $iwC$iwC$iwC$iwC$iwC$iwC$iwC.<init>(<console>:100) at $iwC$iwC$iwC$iwC$iwC$iwC.<init>(<console>:102) at $iwC$iwC$iwC$iwC$iwC.<init>(<console>:104) at $iwC$iwC$iwC$iwC.<init>(<console>:106) at $iwC$iwC$iwC.<init>(<console>:108) at $iwC$iwC.<init>(<console>:110) at $iwC.<init>(<console>:112) at <init>(<console>:114) at .<init>(<console>:118) at .<clinit>(<console>) at .<init>(<console>:7) at .<clinit>(<console>) at $print(<console>) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.zeppelin.spark.Utils.invokeMethod(Utils.java:38) at org.apache.zeppelin.spark.SparkInterpreter.interpret(SparkInterpreter.java:984) at org.apache.zeppelin.spark.SparkInterpreter.interpretInput(SparkInterpreter.java:1189) at org.apache.zeppelin.spark.SparkInterpreter.interpret(SparkInterpreter.java:1156) at org.apache.zeppelin.spark.SparkInterpreter.interpret(SparkInterpreter.java:1149) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:97) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:490) at org.apache.zeppelin.scheduler.Job.run(Job.java:175) at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
I don't get it, why spark can't recognize the table.
Greatings
Mario
Created 12-18-2017 06:49 PM
@Mario Borys: You should register the dataframe as table and then run a select query on that registered able:
equipment.registerTempTable("equipment_table") sqlContext.sql("select * from equipment_table")
Created 12-18-2017 06:49 PM
@Mario Borys: You should register the dataframe as table and then run a select query on that registered able:
equipment.registerTempTable("equipment_table") sqlContext.sql("select * from equipment_table")
Created 12-19-2017 12:55 PM
Like you said its working but I still can't work with this Table on %spark.sql.
do I have to import something or to work with an other Version ?
I'm working with Spark 1.6.1, should I work with Spark 2.x. Sorry for the questions, I'm pretty new to Hadoop.
Created 12-19-2017 03:38 PM
@Mario Borys This tutorial and zeppelin notebook should help you:
https://hortonworks.com/tutorial/hands-on-tour-of-apache-spark-in-5-minutes/
https://raw.githubusercontent.com/hortonworks-gallery/zeppelin-notebooks/hdp-2.6/2CBTZPY14/note.json
Feel free to accept the answer if this helps you.