About nikhil_raina

shobikas · ‎01-08-2019

@Nikhil Raina In hadoop, Mapreduce breaks the jobs into task and these task runs in a parallel way. So that the overall execution time may reduce. Now among the divided tasks, if one of the tasks take more time than desired, then the overall execution time of job increases. The reason can be anything: node busy, network congestion, etc, which limits the total execution time of the Job, and the system should wait for the slow running tasks to be completed. It may be difficult to detect causes since the tasks still complete successfully, although more time is taken than the expected time. Hadoop doesn’t try to diagnose and fix slow running tasks, instead, it tries to detect them and runs backup tasks for them. The backup tasks will be preferentially scheduled on the faster nodes. This is called "speculative execution" in Hadoop. The "backup task" are "speculative Tasks". When a task successfully completes, then duplicate tasks that are running are killed since they are no longer needed. If the original task finishes first, then the speculative task will be killed. On the other hand, if the speculative task finishes first, then the original one will be killed. Simply, "Speculative execution" is a "MapReduce job optimization technique" in Hadoop that is enabled by default. To disable that set the property value "mapred.map.tasks.speculative.execution" - "false" and "mapred.reduce.tasks.speculative.execution" - "false" in "mapred-site.xml". Please accept this answer if you found it helpful.

Online	Offline
Last Visited	‎01-08-2019 10:32 AM

Member Since	‎07-12-2018 06:05 AM
Last Visited	‎01-08-2019 10:32 AM
Posts	2

Cloudera Community

Re: what is speculative execution