Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Hive queries use only mappers or only reducers

avatar
Rising Star

I'm looking for Hive query scenarios, where it uses only mappers or only reducers.

1 ACCEPTED SOLUTION

avatar
Master Guru
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
4 REPLIES 4

avatar
Master Guru
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login

avatar
Rising Star

@Shu Thank you for the explanation.

I wanted to know Hive queries (Hive sql) where there is no reducer phase at all, only mapper phase. Is there such an example ?

avatar
Master Guru

@Ramya Jayathirtha,

As i'm having id,name,age columns in foo table when ever we does

Hive# select name from foo; //in this case first map phase will loads the file and we only selected name column, we are not doing any filtering kind of things here so map phase checks name field and gives results.

MapSideJoins:-

Usually all joins will perform on reducer side as we can explicitly mention load tables to memory and performs joins, no reducer phase will be initialized.

Hive# select /*+MAPJOIN(..)*/... //this kind of joins will loads small table to memory and does the join on map phase only.

When ever we do insert values into table and loading the data should be used only map phase.

Hive# insert into foo values(1,'abc',200);
INFO  : Map 1: -/-
INFO  : Map 1: 0/1
INFO  : Map 1: 0(+1)/1
INFO  : Map 1: 1/1
INFO  : Table default.foo stats: [numFiles=5, numRows=5, totalSize=38, rawDataSize=33]

Simple CTAS without Aggregations:-

When we does Create table as simple select then only mapper phase will be initialized.

if we does any aggregations then reducer phase will get initialized

Hive#create table foo1 stored as orc as select * from foo
INFO : Map 1: -/- INFO : Map 1: 0/1 INFO : Map 1: 0(+1)/1 INFO : Map 1: 1/1 INFO : Table default.foo1 stats: [numFiles=1, numRows=4, totalSize=XXX, rawDataSize=XXXX] No rows affected (10.247 seconds)
Hive#select * from foo1;
+----------+------------+-----------+--+
| foo1.id  | foo1.name  | foo1.age  |
+----------+------------+-----------+--+
| 1        | a          | 10        |
| 2        | a          | 11        |
| 2        | a          | 10        |
| 3        | b          | 10        |
| 4        | b          | 10        |
| 5        | c          | 10        |
+----------+------------+-----------+--+
6 rows selected (0.205 seconds)

2. if we does CTAS with where clause in it still it is just map phase all the filters in WHERE clause are going to be done by mapper phase it self.

Hive#create table foo as select * from foo1 where id='1';
INFO  : Map 1: -/-
INFO  : Map 1: 0/1
INFO  : Map 1: 0(+1)/1
INFO  : Map 1: 1/1
INFO  : Table default.foo stats: [numFiles=1, numRows=1, totalSize=7, rawDataSize=6]
No rows affected (9.984 seconds)
Hive#SELECT * FROM FOO;
+---------+-----------+----------+--+
| foo.id  | foo.name  | foo.age  |
+---------+-----------+----------+--+
| 1       | a         | 10       |
+---------+-----------+----------+--+
1 row selected (0.099 seconds)

avatar
Contributor

@Shu

How is number of Mappers/reducers decided for a given query will be decided in runtime ?

Is it dependet on how many number of Joins or group by or order by clauses that are used in the query ?

If yes, then please let me know how many mappers and reducers are launched for the below query.

select name, count(*) as cnt from test group by name order by name;