Support Questions

rachel_wijsmull · ‎05-19-2016

Hi all,

Odd question - I'm just starting out in Hadoop and am in the process of moving all my test work into production, however I get a strange message on the prod system when working in Hive: "number of reduce tasks is set to 0 since there's no reduce operator". The queries are not failing (yet...?), and there are no strange records in any logs I have looked at. I don't know how to troubleshoot this if indeed it is a problem at all. Any advice?

Example:

hive> select * from myTable where daily_date='2015-12-29' limit 10;
Query ID = root_20160519113838_73d2b4dc-efb8-4ea6-b0a4-cdc4dc64c33a
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1418226366907_2316, Tracking URL = http://hadoop-head01:8088/proxy/application_1418226366907_2316/
Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill job_1418226366907_2316
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2016-05-19 11:39:07,038 Stage-1 map = 0%,  reduce = 0%
2016-05-19 11:39:12,653 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.65 sec
MapReduce Total cumulative CPU time: 2 seconds 650 msec
Ended Job = job_1418226366907_2316
MapReduce Jobs Launched:
Job 0: Map: 1   Cumulative CPU: 2.65 sec   HDFS Read: 64722 HDFS Write: 831 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 650 msec
OK
[..... records ....]
Time taken: 15.876 seconds, Fetched: 10 row(s)

Hadoop version- 2.4.0.2.1.3.0-563

Hive version...(?) 0.13.0.2.1.3.0-563

sluangsay · ‎05-19-2016

This is not a problem at all.

Hive is just telling you that you are doing a "Map only" job.

Usually, in MapReduce (now in Hive we prefer using Tez instead of MapReduce but let's talk about MapReduce here because it is easier to understand) your job will have the following steps: Map -> Shuffle -> Reduce.

The Map and Reduce steps are where computations (in Hive: projections, aggregations, filtering...) happen. Shuffle is just data going on the network, to go from the nodes that launched the mappers to the one that launch the reducers.

So if there is a possibility to do some "Map only" job and to avoid the "Shuffle" and "Reduce" steps, better: your job will be much faster in general and will involve less cluster resources (network, CPU, disk & memory).

The query you are showing on this example is very simple, that is why it can be transformed by Hive into a "Map only" job.

To understand better how the Hive queries are transformed into some MapReduce/Tez jobs, you can have a look at the "explain" command:

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Explain

View solution in original post

jyadav · ‎05-19-2016

@R Wys

This is not an issue since you are using "select *" which doesn't require any kind of computation therefore Mapreduce framework is smart enough to figure out when reducer tasks is required as per provided operators.

sluangsay · ‎05-19-2016