<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Hive queries use only mappers or only reducers in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-queries-use-only-mappers-or-only-reducers/m-p/220635#M69637</link>
    <description>&lt;P&gt;I'm looking for Hive query scenarios, where it uses only mappers or only reducers.&lt;/P&gt;</description>
    <pubDate>Sun, 15 Oct 2017 00:06:17 GMT</pubDate>
    <dc:creator>rmy1712</dc:creator>
    <dc:date>2017-10-15T00:06:17Z</dc:date>
    <item>
      <title>Hive queries use only mappers or only reducers</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-queries-use-only-mappers-or-only-reducers/m-p/220635#M69637</link>
      <description>&lt;P&gt;I'm looking for Hive query scenarios, where it uses only mappers or only reducers.&lt;/P&gt;</description>
      <pubDate>Sun, 15 Oct 2017 00:06:17 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-queries-use-only-mappers-or-only-reducers/m-p/220635#M69637</guid>
      <dc:creator>rmy1712</dc:creator>
      <dc:date>2017-10-15T00:06:17Z</dc:date>
    </item>
    <item>
      <title>Re: Hive queries use only mappers or only reducers</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-queries-use-only-mappers-or-only-reducers/m-p/220636#M69638</link>
      <description>&lt;P&gt;Hi &lt;A rel="user" href="https://community.cloudera.com/users/30206/rmy1712.html" nodeid="30206"&gt;@Ramya Jayathirtha&lt;/A&gt;&lt;/P&gt;&lt;P&gt;in hive if you do simple query like &lt;STRONG&gt;select * from table&lt;/STRONG&gt; there will be no map reduce job is going to run  as we are just dumping the data.&lt;/P&gt;&lt;PRE&gt;Hive# select * from foo;
+---------+-----------+----------+--+
| foo.id  | foo.name  | foo.age  |
+---------+-----------+----------+--+
| 1       | a         | 10       |
| 2       | a         | 10       |
| 3       | b         | 10       |
| 4       | c         | 20       |
+---------+-----------+----------+--+
4 rows selected (0.116 seconds)&lt;/PRE&gt;&lt;P&gt;you can use&lt;STRONG&gt; explain&lt;/STRONG&gt; by adding before with your query, it will displays how the query is going to execute by execution engine and display how many map reduce phases are going to be done for the query.&lt;BR /&gt;&lt;/P&gt;&lt;PRE&gt;Hive# explain select * from foo;
+-------------------------------------------------------+--+
|                        Explain                        |
+-------------------------------------------------------+--+
| Plan not optimized by CBO.                            |
|                                                       |
| Stage-0                                               |
|    Fetch Operator                                     |
|       limit:-1                                        |
|       Select Operator [SEL_5652]                      |
|          outputColumnNames:["_col0","_col1","_col2"]  |
|          TableScan [TS_5651]                          |
|             alias:foo                                 |
|                                                       |
+-------------------------------------------------------+--+&lt;/PRE&gt;&lt;P&gt;When ever you do &lt;STRONG&gt;aggregations&lt;/STRONG&gt; then the &lt;STRONG&gt;reducer phase&lt;/STRONG&gt; will be executed along with map phase.&lt;/P&gt;&lt;PRE&gt;Hive# select count(*) from table group by name;
INFO  : Map 1: 0/1      Reducer 2: 0/2
INFO  : Map 1: 0(+1)/1  Reducer 2: 0/2
INFO  : Map 1: 0(+1)/1  Reducer 2: 0/2
INFO  : Map 1: 0(+1)/1  Reducer 2: 0/2
INFO  : Map 1: 0(+1)/1  Reducer 2: 0/2
INFO  : Map 1: 1/1      Reducer 2: 0/1
INFO  : Map 1: 1/1      Reducer 2: 0(+1)/1
INFO  : Map 1: 1/1      Reducer 2: 1/1
+------+--+
| _c0  |
+------+--+
| 2    |
| 1    |
| 1    |
+------+--+
3 rows selected (13.709 seconds)&lt;/PRE&gt;&lt;P&gt;if you add&lt;STRONG&gt; Explain&lt;/STRONG&gt; in front of above query it will displays&lt;/P&gt;&lt;PRE&gt;Hive# explain select count(*) from foo group by name;
Reducer 2 &amp;lt;- Map 1 (SIMPLE_EDGE)                    &lt;/PRE&gt;as you can see reducer phase along with map phase.&lt;P&gt;we can add another&lt;STRONG&gt; reducer phase&lt;/STRONG&gt; to above query by &lt;STRONG&gt;adding order by&lt;/STRONG&gt; clause to it&lt;/P&gt;&lt;PRE&gt;Hive# select count(*) cnt from foo group by name order by cnt;
INFO  : Map 1: 0/1      Reducer 2: 0/2  Reducer 3: 0/1
INFO  : Map 1: 0(+1)/1  Reducer 2: 0/2  Reducer 3: 0/1
INFO  : Map 1: 1/1      Reducer 2: 0/1  Reducer 3: 0/1
INFO  : Map 1: 1/1      Reducer 2: 0(+1)/1      Reducer 3: 0/1
INFO  : Map 1: 1/1      Reducer 2: 1/1  Reducer 3: 0(+1)/1
INFO  : Map 1: 1/1      Reducer 2: 1/1  Reducer 3: 1/1
+------+--+
| cnt  |
+------+--+
| 1    |
| 1    |
| 2    |
+------+--+&lt;BR /&gt;&lt;/PRE&gt;&lt;P&gt;you can see 2 reducer phases are done because after aggregating we are doing order by to the results&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Map1 phase:-&lt;/STRONG&gt; Loads the data from HDFS.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Reduer2:- &lt;/STRONG&gt;Will does aggregation &lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Reducer 3:-&lt;/STRONG&gt; after aggregation it will order the results to ascending order.&lt;/P&gt;&lt;P&gt;if you do explain on the above query&lt;/P&gt;&lt;PRE&gt;Hive# explain select count(*) cnt from foo group by name order by cnt;&lt;/PRE&gt;&lt;PRE&gt; Vertex dependency in root stage     
 Reducer 2 &amp;lt;- Map 1 (SIMPLE_EDGE)    
 Reducer 3 &amp;lt;- Reducer 2 (SIMPLE_EDGE)&lt;/PRE&gt;</description>
      <pubDate>Sun, 15 Oct 2017 06:24:15 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-queries-use-only-mappers-or-only-reducers/m-p/220636#M69638</guid>
      <dc:creator>Shu_ashu</dc:creator>
      <dc:date>2017-10-15T06:24:15Z</dc:date>
    </item>
    <item>
      <title>Re: Hive queries use only mappers or only reducers</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-queries-use-only-mappers-or-only-reducers/m-p/220637#M69639</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/18929/yaswanthmuppireddy.html" nodeid="18929"&gt;@Shu&lt;/A&gt; Thank you for the explanation.&lt;/P&gt;&lt;P&gt;I wanted to know Hive queries (Hive sql) where there is no reducer phase at all, only mapper phase. Is there such an example ?&lt;/P&gt;</description>
      <pubDate>Tue, 17 Oct 2017 04:30:04 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-queries-use-only-mappers-or-only-reducers/m-p/220637#M69639</guid>
      <dc:creator>rmy1712</dc:creator>
      <dc:date>2017-10-17T04:30:04Z</dc:date>
    </item>
    <item>
      <title>Re: Hive queries use only mappers or only reducers</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-queries-use-only-mappers-or-only-reducers/m-p/220638#M69640</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/30206/rmy1712.html" nodeid="30206"&gt;@Ramya Jayathirtha&lt;/A&gt;, &lt;/P&gt;&lt;P&gt;As i'm having id,name,age columns in foo table when ever we does &lt;/P&gt;&lt;PRE&gt;Hive# select name from foo; //in this case first map phase will loads the file and we only selected name column, we are not doing any filtering kind of things here so map phase checks name field and gives results.&lt;/PRE&gt;&lt;P&gt;&lt;U&gt;&lt;STRONG&gt;MapSideJoins:-&lt;/STRONG&gt;&lt;/U&gt;&lt;STRONG&gt; &lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;/STRONG&gt;Usually all joins will perform on reducer side as we can explicitly mention load tables to memory and performs joins, no reducer phase will be initialized.&lt;/P&gt;&lt;PRE&gt;Hive# select /*+MAPJOIN(..)*/... //this kind of joins will loads small table to memory and does the join on map phase only.&lt;/PRE&gt;&lt;P&gt;When ever we do insert values into table and loading the data should be used only &lt;STRONG&gt;map&lt;/STRONG&gt; phase.&lt;/P&gt;&lt;PRE&gt;Hive# insert into foo values(1,'abc',200);
INFO  : Map 1: -/-
INFO  : Map 1: 0/1
INFO  : Map 1: 0(+1)/1
INFO  : Map 1: 1/1
INFO  : Table default.foo stats: [numFiles=5, numRows=5, totalSize=38, rawDataSize=33]&lt;/PRE&gt;&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;Simple CTAS without Aggregations:-&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;When we does &lt;STRONG&gt;Create table as simple select&lt;/STRONG&gt; then only &lt;STRONG&gt;mapper phase&lt;/STRONG&gt; will be initialized.&lt;/P&gt;&lt;P&gt;if we does any &lt;STRONG&gt;aggregations&lt;/STRONG&gt; then &lt;STRONG&gt;reducer phase&lt;/STRONG&gt; will get initialized&lt;/P&gt;&lt;PRE&gt;Hive#create table foo1 stored as orc as select * from foo&lt;BR /&gt;INFO  : Map 1: -/-
INFO  : Map 1: 0/1
INFO  : Map 1: 0(+1)/1
INFO  : Map 1: 1/1
INFO  : Table default.foo1 stats: [numFiles=1, numRows=4, totalSize=XXX, rawDataSize=XXXX]
No rows affected (10.247 seconds)&lt;BR /&gt;&lt;/PRE&gt;&lt;PRE&gt;Hive#select * from foo1;
+----------+------------+-----------+--+
| foo1.id  | foo1.name  | foo1.age  |
+----------+------------+-----------+--+
| 1        | a          | 10        |
| 2        | a          | 11        |
| 2        | a          | 10        |
| 3        | b          | 10        |
| 4        | b          | 10        |
| 5        | c          | 10        |
+----------+------------+-----------+--+
6 rows selected (0.205 seconds)&lt;/PRE&gt;&lt;P&gt;2. if we does CTAS with &lt;STRONG&gt;where clause in it&lt;/STRONG&gt; still it is just &lt;STRONG&gt;map phase&lt;/STRONG&gt; all the filters in&lt;STRONG&gt; WHERE clause&lt;/STRONG&gt; are going to be done by mapper phase it self.&lt;/P&gt;&lt;PRE&gt;Hive#create table foo as select * from foo1 where id='1';
INFO  : Map 1: -/-
INFO  : Map 1: 0/1
INFO  : Map 1: 0(+1)/1
INFO  : Map 1: 1/1
INFO  : Table default.foo stats: [numFiles=1, numRows=1, totalSize=7, rawDataSize=6]
No rows affected (9.984 seconds)&lt;/PRE&gt;&lt;PRE&gt;Hive#SELECT * FROM FOO;
+---------+-----------+----------+--+
| foo.id  | foo.name  | foo.age  |
+---------+-----------+----------+--+
| 1       | a         | 10       |
+---------+-----------+----------+--+
1 row selected (0.099 seconds)&lt;/PRE&gt;</description>
      <pubDate>Tue, 17 Oct 2017 05:07:24 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-queries-use-only-mappers-or-only-reducers/m-p/220638#M69640</guid>
      <dc:creator>Shu_ashu</dc:creator>
      <dc:date>2017-10-17T05:07:24Z</dc:date>
    </item>
    <item>
      <title>Re: Hive queries use only mappers or only reducers</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-queries-use-only-mappers-or-only-reducers/m-p/220639#M69641</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/18929/yaswanthmuppireddy.html" nodeid="18929"&gt;@Shu&lt;/A&gt; &lt;/P&gt;&lt;P&gt;How is number of Mappers/reducers decided for a given query will be decided in runtime ?&lt;/P&gt;&lt;P&gt;Is it dependet on how many number of Joins or group by or order by clauses that are used in the query ?&lt;/P&gt;&lt;P&gt;If yes, then please let me know how many mappers and reducers are launched for the below query.&lt;/P&gt;&lt;P&gt;select name, count(*) as cnt from test group by name order by name;&lt;/P&gt;</description>
      <pubDate>Wed, 07 Mar 2018 01:50:31 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-queries-use-only-mappers-or-only-reducers/m-p/220639#M69641</guid>
      <dc:creator>rakesh_an1992</dc:creator>
      <dc:date>2018-03-07T01:50:31Z</dc:date>
    </item>
  </channel>
</rss>

