Created 05-19-2016 11:17 AM
Hi all,
Odd question - I'm just starting out in Hadoop and am in the process of moving all my test work into production, however I get a strange message on the prod system when working in Hive: "number of reduce tasks is set to 0 since there's no reduce operator". The queries are not failing (yet...?), and there are no strange records in any logs I have looked at. I don't know how to troubleshoot this if indeed it is a problem at all. Any advice?
Example:
hive> select * from myTable where daily_date='2015-12-29' limit 10; Query ID = root_20160519113838_73d2b4dc-efb8-4ea6-b0a4-cdc4dc64c33a Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1418226366907_2316, Tracking URL = http://hadoop-head01:8088/proxy/application_1418226366907_2316/ Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1418226366907_2316 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2016-05-19 11:39:07,038 Stage-1 map = 0%, reduce = 0% 2016-05-19 11:39:12,653 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.65 sec MapReduce Total cumulative CPU time: 2 seconds 650 msec Ended Job = job_1418226366907_2316 MapReduce Jobs Launched: Job 0: Map: 1 Cumulative CPU: 2.65 sec HDFS Read: 64722 HDFS Write: 831 SUCCESS Total MapReduce CPU Time Spent: 2 seconds 650 msec OK [..... records ....] Time taken: 15.876 seconds, Fetched: 10 row(s)
Hadoop version- 2.4.0.2.1.3.0-563
Hive version...(?) 0.13.0.2.1.3.0-563
Created 05-19-2016 11:24 AM
This is not a problem at all.
Hive is just telling you that you are doing a "Map only" job.
Usually, in MapReduce (now in Hive we prefer using Tez instead of MapReduce but let's talk about MapReduce here because it is easier to understand) your job will have the following steps: Map -> Shuffle -> Reduce.
The Map and Reduce steps are where computations (in Hive: projections, aggregations, filtering...) happen. Shuffle is just data going on the network, to go from the nodes that launched the mappers to the one that launch the reducers.
So if there is a possibility to do some "Map only" job and to avoid the "Shuffle" and "Reduce" steps, better: your job will be much faster in general and will involve less cluster resources (network, CPU, disk & memory).
The query you are showing on this example is very simple, that is why it can be transformed by Hive into a "Map only" job.
To understand better how the Hive queries are transformed into some MapReduce/Tez jobs, you can have a look at the "explain" command:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Explain
Created 05-19-2016 11:21 AM
This is not an issue since you are using "select *" which doesn't require any kind of computation therefore Mapreduce framework is smart enough to figure out when reducer tasks is required as per provided operators.
Created 05-19-2016 11:24 AM
This is not a problem at all.
Hive is just telling you that you are doing a "Map only" job.
Usually, in MapReduce (now in Hive we prefer using Tez instead of MapReduce but let's talk about MapReduce here because it is easier to understand) your job will have the following steps: Map -> Shuffle -> Reduce.
The Map and Reduce steps are where computations (in Hive: projections, aggregations, filtering...) happen. Shuffle is just data going on the network, to go from the nodes that launched the mappers to the one that launch the reducers.
So if there is a possibility to do some "Map only" job and to avoid the "Shuffle" and "Reduce" steps, better: your job will be much faster in general and will involve less cluster resources (network, CPU, disk & memory).
The query you are showing on this example is very simple, that is why it can be transformed by Hive into a "Map only" job.
To understand better how the Hive queries are transformed into some MapReduce/Tez jobs, you can have a look at the "explain" command:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Explain
Created 05-19-2016 11:27 AM
There is no problem with hive here, hive has generated an execution plan with no reduce phase in your case. you can see the plan by running 'explain select*from myTable where daily_date='2015-12-29' limit 10'