Reply
Highlighted
Explorer
Posts: 62
Registered: ‎01-22-2014

Left Semi Join and NOT IN in Spark

[ Edited ]

Hi,

 

I am using Spark SchemaRDD.

 

I am trying to use the queries like

 

"Select a.Col1,b.Col2 from TBL1 a where a.Col1 NOT IN (Select Col1 from TBL2)" and 

 

"Select a.Col1,b.Col2 from TBL1 a left semi join  TBL2 b  on a.Col1 = b.Col2".

 

Both the queries did not work and threw me the below error. Are these queries not supported in Schema RDD? Please clarify. Thanks!

 

 

 

SELECT a.Col1,a.Col2 FROM TBL1 a  WHERE a.Col1 NOT IN (SELECT Col1 FROM TBL2)
                                                                      ^
	at scala.sys.package$.error(package.scala:27)
	at org.apache.spark.sql.catalyst.SqlParser.apply(SqlParser.scala:47)
	at org.apache.spark.sql.SQLContext.parseSql(SQLContext.scala:70)
	at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:150)
	at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:19)
	at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:24)
	at $iwC$$iwC$$iwC$$iwC.<init>(<console>:26)
	at $iwC$$iwC$$iwC.<init>(<console>:28)
	at $iwC$$iwC.<init>(<console>:30)
	at $iwC.<init>(<console>:32)
	at <init>(<console>:34)
	at .<init>(<console>:38)
	at .<clinit>(<console>)
	at .<init>(<console>:7)
	at .<clinit>(<console>)
	at $print(<console>)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:788)
	at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1056)
	at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:614)
	at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:645)
	at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:609)
	at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:796)
	at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:841)
	at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:753)
	at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:601)
	at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:608)
	at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:611)
	at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:936)
	at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884)
	at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884)
	at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
	at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:884)
	at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:982)
	at org.apache.spark.repl.Main$.main(Main.scala:31)
	at org.apache.spark.repl.Main.main(Main.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:292)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

 

Update - Similarly queries with the MINUS statement also did not work . Just wanted to confirm if these issues are expected in SchemaRDD.

 

scala> val nameZ1 = sql("SELECT Col1 FROM TBL1 MINUS SELECT Col1 FROM TBL2")
java.lang.RuntimeException: [1.32] failure: ``UNION'' expected but identifier MINUS found
SELECT Col1 FROM TBL1 MINUS SELECT Col2 FROM TBL2

 

Announcements