Member since
10-22-2015
8
Posts
3
Kudos Received
1
Solution
01-30-2018
04:45 AM
Sorry this is nearly a year later, but the behavior you're seeing is likely that spark.executor.instances and --num-executors no longer disables dynamic allocation in 2.X, we now have to explicitly set spark.dynamicAllocation.enabled to false otherwise it just uses the value in the aforementioned properties as the initial executors but still continues to use dynamic allocation to scale the count up and down. That may or may not explain your situation as you mentioned playing with some of those properties. Additionally the remaining ~13,000 tasks you describe doesn't necessarily mean that there are 13,000 pending tasks, a large portion of those could be for future stages that depend on the current stage and when you're seeing the number of executors reduced it's likely that the current stage was not using all the available executors and they reached the idle limit and were released. You will want to explicitly disable dynamic allocation if you want a static number of executors, and likely want to review if there's a low task count at the time of "decay" and look at the data to figure out why which could potentially be resolved by simply performing a repartition of the RDD/DS/DF in the stage that has the low level of partitions. Alternatively there could be resource management configuration or perhaps an actual bug related to the behavior but I would start with the assumption that it's related to the config, data, or partitioning.
... View more