I am working on
large data volume as Spark is meant for.Recently I was facing Executor Lost
exception and it resolved by increasing executor memoryOverhead. Can anyone
help me to understand the internal functionality of memoryOverhead and what
exactly memory given to memoryOverhead is utilized for? I understand the
equation to derive memoryOverhead. But I am still black box to understand which
objects are stored in this memory? Either those objects belongs to User Classes
(Userdefined classes) or Spark own classes? In first attempt,the executors are
getting lost and task is failed while in second attempt the task are completed
successfully.Why this memory is dependent on my data volume? Only
auto-resubmitting is not a solution.Its part of Spark goodness. Also let me
know how to reduce these objects if this is the only problem due to which all
this happened.