Created 02-09-2017 03:35 PM
I have a textfile , which I am trying to convert it in to dataframe and do SQL query on it but seems its not working properly.
Can someone please help me , I tried in both way by registering to temptable and without it as well. I need this to
case class f1( ckt_id:String, location:String, usage:String, port:String, machine:String ) val input_file = sc.textFile("file:///dev/spark/input_file.txt").map(_.split("\\|")).map(x => f1(x(0).toString,x(1).toString,x(2).toString,x(3).toString,x(4).toString)).toDF scala> input_file.show() +-----------+--------+-----+----+-------+ | ckt_id|location|usage|port|machine| +-----------+--------+-----+----+-------+ | ckt_id|location|usage|port|machine| | AXZCSD21DF| USA| 2GB| 101| MAC1| | ABZCSD21DF| OTH| 4GB| 101| MAC2| | AXZCSD21DF| USA| 6GB| 101| MAC4| | BXZCSD21DF| USA| 7GB| 101| MAC6| | CXZCSD21DF| IND| 2GB| 101| MAC9| | AXZCSD21DF| USA| 1GB| 101| MAC0| | AXZCSD22DF| IND| 9GB| 101| MAC3| |ADZZCSD21DF| USA| 1GB| 101| MAC4| | AXZCSD21DF| USA| 2GB| 101| MAC5| | XZDCSD21DF| OTH| 2GB| 101| MAC1| +-----------+--------+-----+----+-------+ scala> input_file.printSchema() root |-- ckt_id: string (nullable = true) |-- location: string (nullable = true) |-- usage: string (nullable = true) |-- port: string (nullable = true) |-- machine: string (nullable = true) scala> input_file.registerTempTable("input_file") ERROR: scala> val op =sqlContext.sql("SELECT * FROM input_file") org.apache.spark.sql.AnalysisException: Table not found: input_file; at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.getTable(Analyzer.scala:305) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$9.applyOrElse(Analyzer.scala:314) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$9.applyOrElse(Analyzer.scala:309) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:57) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:57) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$$$$$c57ec8bf9b0d5f6161b97741d596ff0$$$$wC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:128) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$$$$$c57ec8bf9b0d5f6161b97741d596ff0$$$$wC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:133) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$$$$$c57ec8bf9b0d5f6161b97741d596ff0$$$$wC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:135) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$$$$$c57ec8bf9b0d5f6161b97741d596ff0$$$$wC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:137) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$$$$$c57ec8bf9b0d5f6161b97741d596ff0$$$$wC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:139) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$$$$$c57ec8bf9b0d5f6161b97741d596ff0$$$$wC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:141) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$$$$$c57ec8bf9b0d5f6161b97741d596ff0$$$$wC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:143)
Created 02-10-2017 12:39 AM
Dinesh, I recreated your scenario (in spark-shell) and its working fine for me. Are you running this in Zeppelin by chance? I've noticed sometimes in Zeppelin, it doesnt create the hive context correctly, so what you can do to make sure you're doing it correctly is run the following code.
val sqlContext = New HiveContext(sc) //your code here
What will happen is we'll create a new HiveContext, and it should fix your problem. I think we're losing the pointer to your sqlContext for some reason, so by recreating, we'll guarantee that the temptable is registered to a sqlContext, and the same sqlContext is going to be queried. Let me know if that fixes your problem, or if you have more issues.
Created 02-10-2017 12:05 PM
Am running it in the Spark-shell and am getting the same error !!
Created 02-10-2017 03:46 PM
am unable to do select query on my input_file_temp, at the same time i can do it on gsam_temp which is the DF i made it from SQL table. If I could do query on both the DF then it would be much easier for me to finish it off.
Here is complete code
import sqlContext.implicits._ import org.apache.spark.sql.hive.HiveContext import org.apache.spark.{SparkContext, SparkConf} import org.apache.spark.sql.functions.broadcast import org.apache.spark.sql.types._ import org.apache.spark.sql._ import org.apache.spark.sql.functions._ val hiveContext = new HiveContext(sc); val sqlContext = new org.apache.spark.sql.SQLContext(sc); // Loading DB table in to dataframe val gsam = hiveContext.read.format("jdbc").option("driver","oracle.jdbc.driver.OracleDriver").option("url","jdbc:oracle:thin:NPIDWDEV/sysdwnpi@scan-nsgnp.ebiz.verizon.com:1521/nsgdev").option("dbtable", "GSAM_REF").load(); gsam.registerTempTable("gsam_temp") // Create case class to load input file from local or hdfs case class f1( ckt_id:String, location:String, usage:String, port:String, machine:String ) val input_file = sc.textFile("file:///data04/dev/v994292/spark/input_file.txt").map(_.split("\\|")).map(x => f1(x(0).toString,x(1).toString,x(2).toString,x(3).toString,x(4).toString)).toDF input_file.registerTempTable("input_file_tmp")
Created 11-03-2017 09:01 AM
I had the same error. Based on this recomendation I added new properties in Zeppelin Spark2 interpreter as spark.deploy.maxExecutorRetries=10 It worked for me.