<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Best practice for Hive actions inside Oozie in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Best-practice-for-Hive-actions-inside-Oozie/m-p/66355#M77254</link>
    <description>There are merits in both approach, but the path to follow would depend on your requirements. While running all of them together would be quicker than running them separately [1] it would cause inflexibility if you run into failures at any step - requiring you to handle retries on the whole script instead of just the failed ones.&lt;BR /&gt;&lt;BR /&gt;Keeping them as separate actions can cause a maintenance issue once the number grows large - making refactoring arduous when there is such a need. Conversely, running them together can cause troubleshooting to become a bit more involved/complex since you'll have to refer to logs to find what step failed precisely within the large batch of statements.&lt;BR /&gt;&lt;BR /&gt;I'd advise approaching your workflow business-wise. Split parts that can exist as independent steps, and group the parts that are more "atomic" or are relatable together as a single entity. Get them running, then observe if there are parts that need to go quicker. Worrying about the performance early can get painful real quick.&lt;BR /&gt;&lt;BR /&gt;[1] There is overhead (expected to reduce after &lt;A href="https://issues.apache.org/jira/browse/OOZIE-1770" target="_blank"&gt;https://issues.apache.org/jira/browse/OOZIE-1770&lt;/A&gt; is ready and in a future CDH, mostly CDH 6.x) in running many small and independent actions, since each action would spin up a whole 1-map-launcher-job on YARN. This could cause a slowdown.</description>
    <pubDate>Mon, 16 Apr 2018 01:31:55 GMT</pubDate>
    <dc:creator>Harsh J</dc:creator>
    <dc:date>2018-04-16T01:31:55Z</dc:date>
    <item>
      <title>Best practice for Hive actions inside Oozie</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Best-practice-for-Hive-actions-inside-Oozie/m-p/66342#M77253</link>
      <description>&lt;P&gt;Hello everyone,&lt;/P&gt;&lt;P&gt;when performing Hive commands inside Oozie is it ok to aggregate them in one script, or it is better to split up in different Hive action/script?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;For example I need to create several views, shoould I put each view creation in a distinct Hive action/script or can I put all the views creation in a single one?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Which is the best practice and why?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 13:06:16 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Best-practice-for-Hive-actions-inside-Oozie/m-p/66342#M77253</guid>
      <dc:creator>ludof</dc:creator>
      <dc:date>2022-09-16T13:06:16Z</dc:date>
    </item>
    <item>
      <title>Re: Best practice for Hive actions inside Oozie</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Best-practice-for-Hive-actions-inside-Oozie/m-p/66355#M77254</link>
      <description>There are merits in both approach, but the path to follow would depend on your requirements. While running all of them together would be quicker than running them separately [1] it would cause inflexibility if you run into failures at any step - requiring you to handle retries on the whole script instead of just the failed ones.&lt;BR /&gt;&lt;BR /&gt;Keeping them as separate actions can cause a maintenance issue once the number grows large - making refactoring arduous when there is such a need. Conversely, running them together can cause troubleshooting to become a bit more involved/complex since you'll have to refer to logs to find what step failed precisely within the large batch of statements.&lt;BR /&gt;&lt;BR /&gt;I'd advise approaching your workflow business-wise. Split parts that can exist as independent steps, and group the parts that are more "atomic" or are relatable together as a single entity. Get them running, then observe if there are parts that need to go quicker. Worrying about the performance early can get painful real quick.&lt;BR /&gt;&lt;BR /&gt;[1] There is overhead (expected to reduce after &lt;A href="https://issues.apache.org/jira/browse/OOZIE-1770" target="_blank"&gt;https://issues.apache.org/jira/browse/OOZIE-1770&lt;/A&gt; is ready and in a future CDH, mostly CDH 6.x) in running many small and independent actions, since each action would spin up a whole 1-map-launcher-job on YARN. This could cause a slowdown.</description>
      <pubDate>Mon, 16 Apr 2018 01:31:55 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Best-practice-for-Hive-actions-inside-Oozie/m-p/66355#M77254</guid>
      <dc:creator>Harsh J</dc:creator>
      <dc:date>2018-04-16T01:31:55Z</dc:date>
    </item>
    <item>
      <title>Re: Best practice for Hive actions inside Oozie</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Best-practice-for-Hive-actions-inside-Oozie/m-p/66369#M77255</link>
      <description>&lt;P&gt;Thank you, exactly what I was thinking. With all queries aggregated in one script I gain speed (no overhead on Yarn containers) but in case of error I loose granularity for debug.&lt;/P&gt;</description>
      <pubDate>Mon, 16 Apr 2018 08:55:05 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Best-practice-for-Hive-actions-inside-Oozie/m-p/66369#M77255</guid>
      <dc:creator>ludof</dc:creator>
      <dc:date>2018-04-16T08:55:05Z</dc:date>
    </item>
  </channel>
</rss>

