Support Questions

rachel_wijsmull · ‎05-19-2016

Hi all,

Odd question - I'm just starting out in Hadoop and am in the process of moving all my test work into production, however I get a strange message on the prod system when working in Hive: "number of reduce tasks is set to 0 since there's no reduce operator". The queries are not failing (yet...?), and there are no strange records in any logs I have looked at. I don't know how to troubleshoot this if indeed it is a problem at all. Any advice?

Example:

hive> select * from myTable where daily_date='2015-12-29' limit 10;
Query ID = root_20160519113838_73d2b4dc-efb8-4ea6-b0a4-cdc4dc64c33a
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1418226366907_2316, Tracking URL = http://hadoop-head01:8088/proxy/application_1418226366907_2316/
Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill job_1418226366907_2316
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2016-05-19 11:39:07,038 Stage-1 map = 0%,  reduce = 0%
2016-05-19 11:39:12,653 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.65 sec
MapReduce Total cumulative CPU time: 2 seconds 650 msec
Ended Job = job_1418226366907_2316
MapReduce Jobs Launched:
Job 0: Map: 1   Cumulative CPU: 2.65 sec   HDFS Read: 64722 HDFS Write: 831 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 650 msec
OK
[..... records ....]
Time taken: 15.876 seconds, Fetched: 10 row(s)

Hadoop version- 2.4.0.2.1.3.0-563

Hive version...(?) 0.13.0.2.1.3.0-563

sluangsay · ‎05-19-2016

This is not a problem at all.

Hive is just telling you that you are doing a "Map only" job.

Usually, in MapReduce (now in Hive we prefer using Tez instead of MapReduce but let's talk about MapReduce here because it is easier to understand) your job will have the following steps: Map -> Shuffle -> Reduce.

The Map and Reduce steps are where computations (in Hive: projections, aggregations, filtering...) happen. Shuffle is just data going on the network, to go from the nodes that launched the mappers to the one that launch the reducers.

So if there is a possibility to do some "Map only" job and to avoid the "Shuffle" and "Reduce" steps, better: your job will be much faster in general and will involve less cluster resources (network, CPU, disk & memory).

The query you are showing on this example is very simple, that is why it can be transformed by Hive into a "Map only" job.

To understand better how the Hive queries are transformed into some MapReduce/Tez jobs, you can have a look at the "explain" command:

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Explain

View solution in original post

jyadav · ‎05-19-2016

@R Wys

This is not an issue since you are using "select *" which doesn't require any kind of computation therefore Mapreduce framework is smart enough to figure out when reducer tasks is required as per provided operators.

sluangsay · ‎05-19-2016

This is not a problem at all.

Hive is just telling you that you are doing a "Map only" job.

Usually, in MapReduce (now in Hive we prefer using Tez instead of MapReduce but let's talk about MapReduce here because it is easier to understand) your job will have the following steps: Map -> Shuffle -> Reduce.

The Map and Reduce steps are where computations (in Hive: projections, aggregations, filtering...) happen. Shuffle is just data going on the network, to go from the nodes that launched the mappers to the one that launch the reducers.

So if there is a possibility to do some "Map only" job and to avoid the "Shuffle" and "Reduce" steps, better: your job will be much faster in general and will involve less cluster resources (network, CPU, disk & memory).

The query you are showing on this example is very simple, that is why it can be transformed by Hive into a "Map only" job.

To understand better how the Hive queries are transformed into some MapReduce/Tez jobs, you can have a look at the "explain" command:

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Explain

rajkumar_singh · ‎05-19-2016

@R Wys

There is no problem with hive here, hive has generated an execution plan with no reduce phase in your case. you can see the plan by running 'explain select*from myTable where daily_date='2015-12-29' limit 10'

Cloudera Community

Support Questions

"Number of reduce tasks is set to 0 since there's no reduce operator": a problem?

how to set number of map and reduce tasks

Number of Tasks created for each reducer

MAP/REDUCE stuck at 0%

We can set the number of reduce tasks for the MapR...

I want to reduce disk usage.

Re writing Avro map reduce to Parquet map reduce

"Map 0% reduce 0%" is not displayed when I execut...

Troubleshooting ambari operation execution using "...

in which memory Map and Reduce tasks is performed ...

Grafana issue: "Problem! java.lang.Exception: Inva...