<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Spark create table from multiple jobs vs single job method in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Spark-create-table-from-multiple-jobs-vs-single-job-method/m-p/92722#M21762</link>
    <description>Hi,&lt;BR /&gt;&lt;BR /&gt;I do not think there is any different. Spark lazily executes statements, so you second 2 jobs version will behave the same way as the first single job, in my opinion.&lt;BR /&gt;&lt;BR /&gt;Cheers&lt;BR /&gt;Eric</description>
    <pubDate>Mon, 15 Jul 2019 10:12:48 GMT</pubDate>
    <dc:creator>EricL</dc:creator>
    <dc:date>2019-07-15T10:12:48Z</dc:date>
    <item>
      <title>Spark create table from multiple jobs vs single job method</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Spark-create-table-from-multiple-jobs-vs-single-job-method/m-p/92328#M21761</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have a table with a lot of data,&lt;/P&gt;&lt;P&gt;I want to create a new&amp;nbsp;table based on some column&amp;nbsp;values&amp;nbsp;from this based&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;which method is most efficient and cluster resources friendly&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Pseudo-Code&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;1. single job&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; insert into myNewTable&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; select * from myOldTable&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; where a=xxx etc.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;2. two jobs:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; job1. create datafame from select statement&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; select * from myOldTable&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; where a=xxx etc. as dataframe&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;job2 write dataframe as new table&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; insert into myNewTable select from dataframe&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 14:29:29 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Spark-create-table-from-multiple-jobs-vs-single-job-method/m-p/92328#M21761</guid>
      <dc:creator>ChineduLB</dc:creator>
      <dc:date>2022-09-16T14:29:29Z</dc:date>
    </item>
    <item>
      <title>Re: Spark create table from multiple jobs vs single job method</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Spark-create-table-from-multiple-jobs-vs-single-job-method/m-p/92722#M21762</link>
      <description>Hi,&lt;BR /&gt;&lt;BR /&gt;I do not think there is any different. Spark lazily executes statements, so you second 2 jobs version will behave the same way as the first single job, in my opinion.&lt;BR /&gt;&lt;BR /&gt;Cheers&lt;BR /&gt;Eric</description>
      <pubDate>Mon, 15 Jul 2019 10:12:48 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Spark-create-table-from-multiple-jobs-vs-single-job-method/m-p/92722#M21762</guid>
      <dc:creator>EricL</dc:creator>
      <dc:date>2019-07-15T10:12:48Z</dc:date>
    </item>
  </channel>
</rss>

