Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

how to do left outer join in pyspark?

Highlighted

how to do left outer join in pyspark?

New Contributor

i create a two tables without header:

loc2=loc1.map(lambda (x,y,z): ('key', {'location': str(x), 'city':str(y), 'state' : str(z)}))
soc1=soc.map(lambda (a,b,c,d,f,g): ('key', {'id': str(a), 'firstname':str(b),'secondname':str(c),'lastname':str(d),  'city' :str(f), 'state':str(g)}))

how can i join two tables soc1 and loc2 with the common city field.

1 REPLY 1

Re: how to do left outer join in pyspark?

See Python docs at http://spark.apache.org/docs/1.6.1/api/python/pyspark.html?highlight=join#pyspark.RDD.leftOuterJoin for syntax and example of a left outer join with RDDs.

Don't have an account?
Coming from Hortonworks? Activate your account here