About Garren

Slaptyback · ‎01-30-2018

Sorry this is nearly a year later, but the behavior you're seeing is likely that spark.executor.instances and --num-executors no longer disables dynamic allocation in 2.X, we now have to explicitly set spark.dynamicAllocation.enabled to false otherwise it just uses the value in the aforementioned properties as the initial executors but still continues to use dynamic allocation to scale the count up and down. That may or may not explain your situation as you mentioned playing with some of those properties. Additionally the remaining ~13,000 tasks you describe doesn't necessarily mean that there are 13,000 pending tasks, a large portion of those could be for future stages that depend on the current stage and when you're seeing the number of executors reduced it's likely that the current stage was not using all the available executors and they reached the idle limit and were released. You will want to explicitly disable dynamic allocation if you want a static number of executors, and likely want to review if there's a low task count at the time of "decay" and look at the data to figure out why which could potentially be resolved by simply performing a repartition of the RDD/DS/DF in the stage that has the low level of partitions. Alternatively there could be resource management configuration or perhaps an actual bug related to the behavior but I would start with the assumption that it's related to the config, data, or partitioning.

cloud123user · ‎01-08-2018

Thanks Ben. Also, do we have any updates that are announced by Cloudera in terms of CDH upgrade required for Meltdown or Spectre apart from OS patches. Thanks!!

Garren · ‎07-12-2017

My issue here ended up being brought by Amazon elastic load balancer timing out after 60 seconds.

Garren · ‎04-25-2017

Thanks @aarman; that worked wonderfully!

Online	Offline
Last Visited	‎06-21-2017 01:52 PM

Member Since	‎04-07-2017 01:04 PM
Last Visited	‎06-21-2017 01:52 PM
Posts	10

Cloudera Community

Re: Mysteriously losing spark executors with many ...

Re: Unsupported major.minor version 51.0 in CDH 5....

Re: Hive session times out after 60 seconds

Re: Cloudera Director support for new AWS i3 insta...