Support Questions

Find answers, ask questions, and share your expertise

How to specify left outer join in dataframe query?

Explorer

I have created a hivecontext in spark and i am reading hive ORC tables from hivecontext into spark dataframes. I have saved that dataframe into temp table.

I am looking for how to specify left outer join when running sql queries on that temporary table? Any help would be appreciated. Code that i am running is mentioned below.

import org.apache.spark.sql._

val hivecontext = new org.apache.spark.sql.hive.HiveContext(sc)

val a = hivecontext.table("customer.a_orc")

val b = hivecontext.table("customer.b_orc")

a.registerTempTable("a")

b.registerTempTable("b")

val output = hivecontext.sql("select a.*,b.* from a,b where a left outer join b on (a.id=b.id))

The output dataframe shown above is giving error. Is there a way to specify left outer join like this or do i have to create separate dataframes?

3 REPLIES 3

Super Guru
@rahul gulati

I think your query is syntactically incorrect. Try the following:

SELECT <columns to select> FROM A LEFT OUTER JOIN B ON A.id = B.id

Expert Contributor

@rahul gulati This is how I did mine,

val outer_join = a.join(b, df1("id") === df2("id"), "left_outer")

Super Guru

he is using a HiveContext. He needs to provide a SQL.