Support Questions

Find answers, ask questions, and share your expertise

what is speculative execution

avatar
New Contributor
 
2 ACCEPTED SOLUTIONS

avatar
Rising Star

Hi @Nikhil Raina,

In simple words, a speculative execution means that Hadoop in overall doesn't try to fix slow tasks as it is hard to detect the reason (misconfiguration, hardware issues, etc), instead, it just launches another parallel/backup task for each task that is performing slower than the expected, on faster nodes. So these backup tasks are called speculative tasks and it can be enabled/disabled as its benefits are per use case and up to the Hadoop Admin to consider it to be beneficial or not; speculative execution has an impact on the cluster throughput and resource usage.

You can find this in MapReduce or Spark for example.

Hope it helps,

David

View solution in original post

avatar
Expert Contributor

@Nikhil Raina

In hadoop, Mapreduce breaks the jobs into task and these task runs in a parallel way. So that the overall execution time may reduce. Now among the divided tasks, if one of the tasks take more time than desired, then the overall execution time of job increases. The reason can be anything: node busy, network congestion, etc, which limits the total execution time of the Job, and the system should wait for the slow running tasks to be completed. It may be difficult to detect causes since the tasks still complete successfully, although more time is taken than the expected time. Hadoop doesn’t try to diagnose and fix slow running tasks, instead, it tries to detect them and runs backup tasks for them. The backup tasks will be preferentially scheduled on the faster nodes. This is called "speculative execution" in Hadoop. The "backup task" are "speculative Tasks". When a task successfully completes, then duplicate tasks that are running are killed since they are no longer needed. If the original task finishes first, then the speculative task will be killed. On the other hand, if the speculative task finishes first, then the original one will be killed. Simply, "Speculative execution" is a "MapReduce job optimization technique" in Hadoop that is enabled by default. To disable that set the property value "mapred.map.tasks.speculative.execution" - "false" and "mapred.reduce.tasks.speculative.execution" - "false" in "mapred-site.xml".

Please accept this answer if you found it helpful.

View solution in original post

2 REPLIES 2

avatar
Rising Star

Hi @Nikhil Raina,

In simple words, a speculative execution means that Hadoop in overall doesn't try to fix slow tasks as it is hard to detect the reason (misconfiguration, hardware issues, etc), instead, it just launches another parallel/backup task for each task that is performing slower than the expected, on faster nodes. So these backup tasks are called speculative tasks and it can be enabled/disabled as its benefits are per use case and up to the Hadoop Admin to consider it to be beneficial or not; speculative execution has an impact on the cluster throughput and resource usage.

You can find this in MapReduce or Spark for example.

Hope it helps,

David

avatar
Expert Contributor

@Nikhil Raina

In hadoop, Mapreduce breaks the jobs into task and these task runs in a parallel way. So that the overall execution time may reduce. Now among the divided tasks, if one of the tasks take more time than desired, then the overall execution time of job increases. The reason can be anything: node busy, network congestion, etc, which limits the total execution time of the Job, and the system should wait for the slow running tasks to be completed. It may be difficult to detect causes since the tasks still complete successfully, although more time is taken than the expected time. Hadoop doesn’t try to diagnose and fix slow running tasks, instead, it tries to detect them and runs backup tasks for them. The backup tasks will be preferentially scheduled on the faster nodes. This is called "speculative execution" in Hadoop. The "backup task" are "speculative Tasks". When a task successfully completes, then duplicate tasks that are running are killed since they are no longer needed. If the original task finishes first, then the speculative task will be killed. On the other hand, if the speculative task finishes first, then the original one will be killed. Simply, "Speculative execution" is a "MapReduce job optimization technique" in Hadoop that is enabled by default. To disable that set the property value "mapred.map.tasks.speculative.execution" - "false" and "mapred.reduce.tasks.speculative.execution" - "false" in "mapred-site.xml".

Please accept this answer if you found it helpful.