<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: spark join with udf fails in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/spark-join-with-udf-fails/m-p/122873#M85626</link>
    <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/10875/xrcsblue.html" nodeid="10875"&gt;@xrcs blue&lt;/A&gt; &lt;/P&gt;&lt;P&gt;Looks like you are using Spark python API. The pyspark documentation says:&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;join&lt;/STRONG&gt; : &lt;/P&gt;&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;on&lt;/STRONG&gt; – a string for join column name, a list of column names, , a join expression (Column) or a list of Columns. If &lt;CITE&gt;on&lt;/CITE&gt; is a string or a list of string indicating the name of the join column(s), the column(s) must exist on both sides, and this performs an equi-join.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Therefore, do the columns exist on both sides of join tables? Also, wondering if you can encode the "condition" separately, then pass it to the join() method, like this:&lt;/P&gt;&lt;PRE&gt;&amp;gt;&amp;gt;&amp;gt; cond = [df.name == df3.name, df.age == df3.age]
&amp;gt;&amp;gt;&amp;gt; df.join(df3, cond, 'outer')&lt;/PRE&gt;</description>
    <pubDate>Tue, 12 Jul 2016 01:35:19 GMT</pubDate>
    <dc:creator>phargis</dc:creator>
    <dc:date>2016-07-12T01:35:19Z</dc:date>
  </channel>
</rss>

