Spark jobs are divided into stages which are divided into tasks. Where tasks are the unit which is executed on the executor with the driver orchestrating the computation order and the task per executor.
Yarn allows resource management and has a preemption capability, which can task resources from one application to another.
My question is which one of the units is actually preempted, suppose an executor's current calculation is canceled and it starts serving a different application which one of the units is canceled?
If its task, a different executor may pick up the calculation (what about the intermediate results of the stage?) If its stage, the question is the same. And a job cancellation seems like too much.