We have one of the spark job (1.4.1 Version) that runs some basic HIVE SQL Select/Insert queries. Sometimes the driver kills all the allocated executors when the new stage gets started and the application hangs/idle for longer time (job in running status). If I bring down the worker node where the driver is running then a new driver is getting started and allocating all the executors. What is the reason for executors getting killed by the driver. No errors on the logs.
How do you identify that driver kills all the allocated executors when the new stage gets started? Did you have application logs and event logs? Did you find any information in the application logs when the driver kills the executor? In which mode are you submitting the spark (Client (or). Cluster) .Do you have the driver logs if it is submitted in client mode Can you share to us?