Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

How to specify left outer join in dataframe query?

Explorer

I have created a hivecontext in spark and i am reading hive ORC tables from hivecontext into spark dataframes. I have saved that dataframe into temp table.

I am looking for how to specify left outer join when running sql queries on that temporary table? Any help would be appreciated. Code that i am running is mentioned below.

import org.apache.spark.sql._

val hivecontext = new org.apache.spark.sql.hive.HiveContext(sc)

val a = hivecontext.table("customer.a_orc")

val b = hivecontext.table("customer.b_orc")

a.registerTempTable("a")

b.registerTempTable("b")

val output = hivecontext.sql("select a.*,b.* from a,b where a left outer join b on (a.id=b.id))

The output dataframe shown above is giving error. Is there a way to specify left outer join like this or do i have to create separate dataframes?

3 REPLIES 3

Super Guru
@rahul gulati

I think your query is syntactically incorrect. Try the following:

SELECT <columns to select> FROM A LEFT OUTER JOIN B ON A.id = B.id

Expert Contributor

@rahul gulati This is how I did mine,

val outer_join = a.join(b, df1("id") === df2("id"), "left_outer")

Super Guru

he is using a HiveContext. He needs to provide a SQL.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.