Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

"Number of reduce tasks is set to 0 since there's no reduce operator": a problem?

Solved Go to solution
Highlighted

"Number of reduce tasks is set to 0 since there's no reduce operator": a problem?

New Contributor

Hi all,

Odd question - I'm just starting out in Hadoop and am in the process of moving all my test work into production, however I get a strange message on the prod system when working in Hive: "number of reduce tasks is set to 0 since there's no reduce operator". The queries are not failing (yet...?), and there are no strange records in any logs I have looked at. I don't know how to troubleshoot this if indeed it is a problem at all. Any advice?

Example:

hive> select * from myTable where daily_date='2015-12-29' limit 10;
Query ID = root_20160519113838_73d2b4dc-efb8-4ea6-b0a4-cdc4dc64c33a
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1418226366907_2316, Tracking URL = http://hadoop-head01:8088/proxy/application_1418226366907_2316/
Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill job_1418226366907_2316
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2016-05-19 11:39:07,038 Stage-1 map = 0%,  reduce = 0%
2016-05-19 11:39:12,653 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.65 sec
MapReduce Total cumulative CPU time: 2 seconds 650 msec
Ended Job = job_1418226366907_2316
MapReduce Jobs Launched:
Job 0: Map: 1   Cumulative CPU: 2.65 sec   HDFS Read: 64722 HDFS Write: 831 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 650 msec
OK
[..... records ....]
Time taken: 15.876 seconds, Fetched: 10 row(s)

Hadoop version- 2.4.0.2.1.3.0-563

Hive version...(?) 0.13.0.2.1.3.0-563

1 ACCEPTED SOLUTION

Accepted Solutions

Re: "Number of reduce tasks is set to 0 since there's no reduce operator": a problem?

Expert Contributor

This is not a problem at all.

Hive is just telling you that you are doing a "Map only" job.

Usually, in MapReduce (now in Hive we prefer using Tez instead of MapReduce but let's talk about MapReduce here because it is easier to understand) your job will have the following steps: Map -> Shuffle -> Reduce.

The Map and Reduce steps are where computations (in Hive: projections, aggregations, filtering...) happen. Shuffle is just data going on the network, to go from the nodes that launched the mappers to the one that launch the reducers.

So if there is a possibility to do some "Map only" job and to avoid the "Shuffle" and "Reduce" steps, better: your job will be much faster in general and will involve less cluster resources (network, CPU, disk & memory).

The query you are showing on this example is very simple, that is why it can be transformed by Hive into a "Map only" job.

To understand better how the Hive queries are transformed into some MapReduce/Tez jobs, you can have a look at the "explain" command:

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Explain

3 REPLIES 3

Re: "Number of reduce tasks is set to 0 since there's no reduce operator": a problem?

@R Wys

This is not an issue since you are using "select *" which doesn't require any kind of computation therefore Mapreduce framework is smart enough to figure out when reducer tasks is required as per provided operators.

Re: "Number of reduce tasks is set to 0 since there's no reduce operator": a problem?

Expert Contributor

This is not a problem at all.

Hive is just telling you that you are doing a "Map only" job.

Usually, in MapReduce (now in Hive we prefer using Tez instead of MapReduce but let's talk about MapReduce here because it is easier to understand) your job will have the following steps: Map -> Shuffle -> Reduce.

The Map and Reduce steps are where computations (in Hive: projections, aggregations, filtering...) happen. Shuffle is just data going on the network, to go from the nodes that launched the mappers to the one that launch the reducers.

So if there is a possibility to do some "Map only" job and to avoid the "Shuffle" and "Reduce" steps, better: your job will be much faster in general and will involve less cluster resources (network, CPU, disk & memory).

The query you are showing on this example is very simple, that is why it can be transformed by Hive into a "Map only" job.

To understand better how the Hive queries are transformed into some MapReduce/Tez jobs, you can have a look at the "explain" command:

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Explain

Re: "Number of reduce tasks is set to 0 since there's no reduce operator": a problem?

@R Wys

There is no problem with hive here, hive has generated an execution plan with no reduce phase in your case. you can see the plan by running 'explain select*from myTable where daily_date='2015-12-29' limit 10'