<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Hive Explain plan Interpretation in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Hive-Explain-plan-Interpretation/m-p/214816#M176728</link>
    <description>&lt;P&gt; &lt;A rel="user" href="https://community.cloudera.com/users/3057/sreeviswaathikala.html" nodeid="3057"&gt;@Viswa&lt;/A&gt;, the best way to influence performance and optimize the explain plan is to make sure you have updated table statistics. Hive doesn't provide an auto update stat option so if there are significant table changes, you'll want to periodically update the statistics. Also be sure you've turned on the Cost Based Optimizer (CBO).  Hive has a CBO and a rule based optimizer - you'll want both. Finally, another benefit to analyze table is if you are using LLAP then ANALYZE table will cache the table. &lt;/P&gt;&lt;P&gt;Also, a broadcast edge means there was a broadcast join.&lt;/P&gt;</description>
    <pubDate>Fri, 30 Jun 2017 19:50:05 GMT</pubDate>
    <dc:creator>SQLShaw</dc:creator>
    <dc:date>2017-06-30T19:50:05Z</dc:date>
    <item>
      <title>Hive Explain plan Interpretation</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hive-Explain-plan-Interpretation/m-p/214812#M176724</link>
      <description>&lt;P&gt;Am trying to understand Hive Explain plan. Sample explain plan for my query is as below&lt;/P&gt;&lt;P&gt;Vertex dependency in root stage
Reducer 2 &amp;lt;- Map 1 (SIMPLE_EDGE), Map 2 (SIMPLE_EDGE), Map 3 (BROADCAST_EDGE)
Reducer 3 &amp;lt;- Map 4 (SIMPLE_EDGE), Map 5 (BROADCAST_EDGE), Reducer 2 (SIMPLE_EDGE)
Reducer 4 &amp;lt;- Map 6 (SIMPLE_EDGE), Map 7 (BROADCAST_EDGE), Map 8 (BROADCAST_EDGE), Reducer 3 (SIMPLE_EDGE).&lt;/P&gt;&lt;P&gt;Can some one help me in understanding what is SIMPLE_EDGE and BROADCAST_EDGE.&lt;/P&gt;&lt;P&gt; What should I interpret from BROADCAST_EDGE and SIMPLE_EDGE?&lt;/P&gt;</description>
      <pubDate>Fri, 23 Jun 2017 01:33:51 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hive-Explain-plan-Interpretation/m-p/214812#M176724</guid>
      <dc:creator>sreeviswa_athic</dc:creator>
      <dc:date>2017-06-23T01:33:51Z</dc:date>
    </item>
    <item>
      <title>Re: Hive Explain plan Interpretation</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hive-Explain-plan-Interpretation/m-p/214813#M176725</link>
      <description>&lt;P&gt;You probably are using Hive on Tez. There is user-level explain for Hive on Tez users. Apply below setting and then run 'explain' query to see much more clearly readable tree of operations. This is also available for Hive on Spark and setting is called 'hive.spark.explain.user'&lt;/P&gt;&lt;PRE&gt;set hive.explain.user=true&lt;/PRE&gt;</description>
      <pubDate>Fri, 23 Jun 2017 01:58:43 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hive-Explain-plan-Interpretation/m-p/214813#M176725</guid>
      <dc:creator>rreddy</dc:creator>
      <dc:date>2017-06-23T01:58:43Z</dc:date>
    </item>
    <item>
      <title>Re: Hive Explain plan Interpretation</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hive-Explain-plan-Interpretation/m-p/214814#M176726</link>
      <description>&lt;P&gt;after adding this setting, am getting the same explain plan. Nothing additional&lt;/P&gt;</description>
      <pubDate>Fri, 23 Jun 2017 02:32:25 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hive-Explain-plan-Interpretation/m-p/214814#M176726</guid>
      <dc:creator>sreeviswa_athic</dc:creator>
      <dc:date>2017-06-23T02:32:25Z</dc:date>
    </item>
    <item>
      <title>Re: Hive Explain plan Interpretation</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hive-Explain-plan-Interpretation/m-p/214815#M176727</link>
      <description>&lt;P&gt;If you are using Hive from HDP 2.6.0 or later, you might get help understanding the query execution by using the &lt;EM&gt;visual explain plan &lt;/EM&gt;feature in Hive Views 2.0 of Ambari. Open the &lt;A href="https://community.cloudera.com/"&gt;Query Tab&lt;/A&gt; documentation of the &lt;EM&gt;Ambari Views Guide, &lt;/EM&gt;and search for the "Visual Explain Plan" section.&lt;/P&gt;</description>
      <pubDate>Sat, 24 Jun 2017 05:01:07 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hive-Explain-plan-Interpretation/m-p/214815#M176727</guid>
      <dc:creator>fwelsch</dc:creator>
      <dc:date>2017-06-24T05:01:07Z</dc:date>
    </item>
    <item>
      <title>Re: Hive Explain plan Interpretation</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hive-Explain-plan-Interpretation/m-p/214816#M176728</link>
      <description>&lt;P&gt; &lt;A rel="user" href="https://community.cloudera.com/users/3057/sreeviswaathikala.html" nodeid="3057"&gt;@Viswa&lt;/A&gt;, the best way to influence performance and optimize the explain plan is to make sure you have updated table statistics. Hive doesn't provide an auto update stat option so if there are significant table changes, you'll want to periodically update the statistics. Also be sure you've turned on the Cost Based Optimizer (CBO).  Hive has a CBO and a rule based optimizer - you'll want both. Finally, another benefit to analyze table is if you are using LLAP then ANALYZE table will cache the table. &lt;/P&gt;&lt;P&gt;Also, a broadcast edge means there was a broadcast join.&lt;/P&gt;</description>
      <pubDate>Fri, 30 Jun 2017 19:50:05 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hive-Explain-plan-Interpretation/m-p/214816#M176728</guid>
      <dc:creator>SQLShaw</dc:creator>
      <dc:date>2017-06-30T19:50:05Z</dc:date>
    </item>
    <item>
      <title>Re: Hive Explain plan Interpretation</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hive-Explain-plan-Interpretation/m-p/214817#M176729</link>
      <description>&lt;A rel="user" href="https://community.cloudera.com/users/3057/sreeviswaathikala.html" nodeid="3057"&gt;@Viswa&lt;/A&gt;&lt;P&gt;Can you post your query and full explain plan? Looks like not all the output is there so hard for anyone to explain what it is doing.&lt;/P&gt;&lt;P&gt;In the meantime, here is a pretty helpful presentation about reading Hive explain plans: &lt;A href="https://www.slideshare.net/HadoopSummit/how-to-understand-and-analyze-apache-hive-query-execution-plan-for-performance-debugging" target="_blank"&gt;https://www.slideshare.net/HadoopSummit/how-to-understand-and-analyze-apache-hive-query-execution-plan-for-performance-debugging&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Assuming you're using the new Hive explain plan (hive.explain.user=true), some general quick tips:&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;Data flows from the bottom of the explain plan to the top&lt;/LI&gt;&lt;LI&gt;Operators can have multiple children (ex: to do a MAPJOIN you might need to do a MAP and a FILTER)&lt;/LI&gt;&lt;/OL&gt;</description>
      <pubDate>Fri, 30 Jun 2017 20:22:35 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hive-Explain-plan-Interpretation/m-p/214817#M176729</guid>
      <dc:creator>christopher_w_m</dc:creator>
      <dc:date>2017-06-30T20:22:35Z</dc:date>
    </item>
    <item>
      <title>Re: Hive Explain plan Interpretation</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hive-Explain-plan-Interpretation/m-p/214818#M176730</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/3057/sreeviswaathikala.html" nodeid="3057"&gt;@Viswa&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/3057/sreeviswaathikala.html" nodeid="3057"&gt;&lt;/A&gt;In Tez, there are following types of &lt;A href="https://tez.apache.org/releases/0.8.5/tez-api-javadocs/org/apache/tez/dag/api/EdgeProperty.DataMovementType.html"&gt;DataMovements&lt;/A&gt; that take place between 2 vertex and is represented via an Edge in the DAG.&lt;/P&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;&lt;CODE&gt;&lt;STRONG&gt;&lt;A href="https://tez.apache.org/releases/0.8.5/tez-api-javadocs/org/apache/tez/dag/api/EdgeProperty.DataMovementType.html#BROADCAST"&gt;BROADCAST&lt;/A&gt;&lt;/STRONG&gt;&lt;/CODE&gt;Output on this edge produced by any source task is available to all destination tasks.
&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;&lt;CODE&gt;&lt;STRONG&gt;&lt;A href="https://tez.apache.org/releases/0.8.5/tez-api-javadocs/org/apache/tez/dag/api/EdgeProperty.DataMovementType.html#CUSTOM"&gt;CUSTOM&lt;/A&gt;&lt;/STRONG&gt;&lt;/CODE&gt;Custom routing defined by the user.
&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;&lt;CODE&gt;&lt;STRONG&gt;&lt;A href="https://tez.apache.org/releases/0.8.5/tez-api-javadocs/org/apache/tez/dag/api/EdgeProperty.DataMovementType.html#ONE_TO_ONE"&gt;ONE_TO_ONE&lt;/A&gt;&lt;/STRONG&gt;&lt;/CODE&gt;Output on this edge produced by the i-th source task is available to the i-th destination task.
&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;&lt;CODE&gt;&lt;STRONG&gt;&lt;A href="https://tez.apache.org/releases/0.8.5/tez-api-javadocs/org/apache/tez/dag/api/EdgeProperty.DataMovementType.html#SCATTER_GATHER"&gt;SCATTER_GATHER&lt;/A&gt;&lt;/STRONG&gt;&lt;/CODE&gt;The i-th output on this edge produced by all source tasks is available to the same destination task. &lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;To answer your question:
SIMPLE_EDGE refers to data movement type - SCATTER_GATHER (example - SHUFFLE JOIN )
BROADCAST_EDGE refers to data movement type - BROADCAST (example -  MAP JOIN)&lt;/P&gt;&lt;P&gt;I drew the above inference from &lt;A href="http://grepcode.com/file/repo1.maven.org/maven2/co.cask.cdap/hive-exec/0.13.0/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java?av=f"&gt;createEdgeProperty() in source code&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Hope this helps.&lt;/P&gt;</description>
      <pubDate>Sat, 01 Jul 2017 03:20:05 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hive-Explain-plan-Interpretation/m-p/214818#M176730</guid>
      <dc:creator>dineshc</dc:creator>
      <dc:date>2017-07-01T03:20:05Z</dc:date>
    </item>
  </channel>
</rss>

