<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Spark can't join dataframes without using over a hundred GB of ram and going OOM? in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Spark-can-t-join-dataframes-without-using-over-a-hundred-GB/m-p/222532#M184402</link>
    <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/83321/robcornell.html" nodeid="83321"&gt;@Robert Cornell&lt;/A&gt;&lt;BR /&gt;We have a similar issue with joining, even after bucketing and presorting of both tables it still throws to us this kind of behavior(in theory it simply should zip both dataframes without any additional operations). &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Only one difference is we use Spark SQL and outer join instead of RDD, but the symptoms look quite similar.  Did you manage to fix it?&lt;/P&gt;</description>
    <pubDate>Wed, 10 Oct 2018 18:02:03 GMT</pubDate>
    <dc:creator>ssemyonov</dc:creator>
    <dc:date>2018-10-10T18:02:03Z</dc:date>
  </channel>
</rss>

