Member since
07-19-2017
1
Post
0
Kudos Received
0
Solutions
07-28-2017
08:19 AM
how will i avoid shuffle if i have to join both the data frames on 2 join keys, df1 = sqlContext.sql("SELECT * FROM TABLE1 CLSUTER BY JOINKEY1,JOINKEY2") df2 = sqlContext.sql("SELECT * FROM TABLE2 CLUSTER BY JOINKEY1,JOINKEY2") df3 = sqlContext.sql("SELECT * FROM TABLE3 CLUSTER BY JOINKEY1,JOINKEY2") df4=df1.join(df2, df1.JOINKEY1=df2.JOINJEY1 and df1.JOINKEY2=df2.JOINKEY2, "inner") df5 =df1.except(df4).union(df3)
... View more