Support Questions

das_dineshk · ‎02-09-2017

I have a textfile , which I am trying to convert it in to dataframe and do SQL query on it but seems its not working properly.

Can someone please help me , I tried in both way by registering to temptable and without it as well. I need this to

case class f1(
  ckt_id:String,
  location:String,
  usage:String,
  port:String,
  machine:String
)

val input_file = sc.textFile("file:///dev/spark/input_file.txt").map(_.split("\\|")).map(x => f1(x(0).toString,x(1).toString,x(2).toString,x(3).toString,x(4).toString)).toDF

scala> input_file.show()
+-----------+--------+-----+----+-------+
|     ckt_id|location|usage|port|machine|
+-----------+--------+-----+----+-------+
|     ckt_id|location|usage|port|machine|
| AXZCSD21DF|     USA|  2GB| 101|   MAC1|
| ABZCSD21DF|     OTH|  4GB| 101|   MAC2|
| AXZCSD21DF|     USA|  6GB| 101|   MAC4|
| BXZCSD21DF|     USA|  7GB| 101|   MAC6|
| CXZCSD21DF|     IND|  2GB| 101|   MAC9|
| AXZCSD21DF|     USA|  1GB| 101|   MAC0|
| AXZCSD22DF|     IND|  9GB| 101|   MAC3|
|ADZZCSD21DF|     USA|  1GB| 101|   MAC4|
| AXZCSD21DF|     USA|  2GB| 101|   MAC5|
| XZDCSD21DF|     OTH|  2GB| 101|   MAC1|
+-----------+--------+-----+----+-------+




scala> input_file.printSchema()
root
 |-- ckt_id: string (nullable = true)
 |-- location: string (nullable = true)
 |-- usage: string (nullable = true)
 |-- port: string (nullable = true)
 |-- machine: string (nullable = true)

scala> input_file.registerTempTable("input_file")

ERROR:

scala> val op =sqlContext.sql("SELECT * FROM input_file")
org.apache.spark.sql.AnalysisException: Table not found: input_file;
        at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
        at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.getTable(Analyzer.scala:305)
        at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$9.applyOrElse(Analyzer.scala:314)
        at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$9.applyOrElse(Analyzer.scala:309)
        at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:57)
        at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:57)
        at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69)

 at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$$$$$c57ec8bf9b0d5f6161b97741d596ff0$$$$wC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:128)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$$$$$c57ec8bf9b0d5f6161b97741d596ff0$$$$wC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:133)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$$$$$c57ec8bf9b0d5f6161b97741d596ff0$$$$wC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:135)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$$$$$c57ec8bf9b0d5f6161b97741d596ff0$$$$wC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:137)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$$$$$c57ec8bf9b0d5f6161b97741d596ff0$$$$wC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:139)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$$$$$c57ec8bf9b0d5f6161b97741d596ff0$$$$wC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:141)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$$$$$c57ec8bf9b0d5f6161b97741d596ff0$$$$wC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:143)

jwiden · ‎02-10-2017

Dinesh, I recreated your scenario (in spark-shell) and its working fine for me. Are you running this in Zeppelin by chance? I've noticed sometimes in Zeppelin, it doesnt create the hive context correctly, so what you can do to make sure you're doing it correctly is run the following code.

val sqlContext = New HiveContext(sc)

//your code here

What will happen is we'll create a new HiveContext, and it should fix your problem. I think we're losing the pointer to your sqlContext for some reason, so by recreating, we'll guarantee that the temptable is registered to a sqlContext, and the same sqlContext is going to be queried. Let me know if that fixes your problem, or if you have more issues.

das_dineshk · ‎02-10-2017

@Joe Widen

Am running it in the Spark-shell and am getting the same error !!

das_dineshk · ‎02-10-2017

@Joe Widen

am unable to do select query on my input_file_temp, at the same time i can do it on gsam_temp which is the DF i made it from SQL table. If I could do query on both the DF then it would be much easier for me to finish it off.

Here is complete code

import sqlContext.implicits._
import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.{SparkContext, SparkConf}
import org.apache.spark.sql.functions.broadcast
import org.apache.spark.sql.types._
import org.apache.spark.sql._
import org.apache.spark.sql.functions._


val hiveContext = new HiveContext(sc);
val sqlContext = new org.apache.spark.sql.SQLContext(sc);


// Loading DB table in to dataframe
val gsam = hiveContext.read.format("jdbc").option("driver","oracle.jdbc.driver.OracleDriver").option("url","jdbc:oracle:thin:NPIDWDEV/sysdwnpi@scan-nsgnp.ebiz.verizon.com:1521/nsgdev").option("dbtable", "GSAM_REF").load();


gsam.registerTempTable("gsam_temp")


// Create case class to load input file from local or hdfs


case class f1(
  ckt_id:String,
  location:String,
  usage:String,
  port:String,
  machine:String
)


val input_file = sc.textFile("file:///data04/dev/v994292/spark/input_file.txt").map(_.split("\\|")).map(x => f1(x(0).toString,x(1).toString,x(2).toString,x(3).toString,x(4).toString)).toDF
input_file.registerTempTable("input_file_tmp")

erkansirin78 · ‎11-03-2017

I had the same error. Based on this recomendation I added new properties in Zeppelin Spark2 interpreter as spark.deploy.maxExecutorRetries=10 It worked for me.