We run Spark App in Hadoop cluster ( HDP version – 2.6.4 , we have spark version 2.2 )
With the following details
Executor memory =35G spark2 daemon memory =50G
from the yarn logs we can see many of these warning
WARN executor.Executor: Issue communicating with driver in heartbeater
What this messages means – “executor.Executor: Issue communicating with driver in heartbeater”
And any ideas if needed additional spark configuration to solve this issue ?
The warning message means that the Executor is unable to send the Heartbeat to the driver (might be network issue). This is just a warning message, but each failure increments heartbeat failure count and when we hit the maximum failures the executor will fail and exit with error.
There are two configurations that we can tune to avoid this issue.
spark.executor.heartbeat.maxFailures (default value: 60)
Number of times an executor will try to send heartbeats to the driver before it gives up and exits (with exit code 56).
spark.executor.heartbeatInterval ( default value: 10s )
Interval between each executor's heartbeats to the driver. Heartbeats let the driver know that the executor is still alive and update it with metrics for in-progress tasks. spark.executor.heartbeatInterval should be significantly less than spark.network.timeout