Created 04-14-2017 05:59 PM
I have created a hivecontext in spark and i am reading hive ORC tables from hivecontext into spark dataframes. I have saved that dataframe into temp table.
I am looking for how to specify left outer join when running sql queries on that temporary table? Any help would be appreciated. Code that i am running is mentioned below.
import org.apache.spark.sql._
val hivecontext = new org.apache.spark.sql.hive.HiveContext(sc)
val a = hivecontext.table("customer.a_orc")
val b = hivecontext.table("customer.b_orc")
a.registerTempTable("a")
b.registerTempTable("b")
val output = hivecontext.sql("select a.*,b.* from a,b where a left outer join b on (a.id=b.id))
The output dataframe shown above is giving error. Is there a way to specify left outer join like this or do i have to create separate dataframes?
Created 04-14-2017 06:09 PM
I think your query is syntactically incorrect. Try the following:
SELECT <columns to select> FROM A LEFT OUTER JOIN B ON A.id = B.id
Created 04-14-2017 09:08 PM
@rahul gulati This is how I did mine,
val outer_join = a.join(b, df1("id") === df2("id"), "left_outer")
Created 04-14-2017 09:10 PM
he is using a HiveContext. He needs to provide a SQL.