Member since
07-21-2016
10
Posts
2
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
24385 | 07-25-2016 10:23 AM |
08-01-2016
10:44 AM
Just want to support that answer. a repartition is not bad in any case since I have seen some interesting characteristics with kafka producers. If you use round robin for them they send data to a random partition and switch them every 10 min or so. So it is possible that a single Partition in Kafka will randomly get ALL the data and blow your spark application up ( Flume kafka connector was my example ). A repartition after the KafkaStream fixed that. You can parametrize this based on the number of executors etc.
... View more
07-25-2017
04:13 PM
Hello, HDP 2.6.1 has Spark 2, is dynamic resource allocation for streaming jobs working now?
... View more
07-28-2017
08:19 AM
We were into the same scenario where Zeppelin was always launching the 3 Containers in YARN even after having the Dynamic allocation parameters enabled from Spark but Zeppelin is not able to pick these parameters,
To get the Zeppelin to launch more than 3 containers (the default it is launching) we need to configure in the Zeppelin Spark interpreter spark.dynamicAllocation.enabled=true
spark.shuffle.service.enabled=true
spark.dynamicAllocation.initialExecutors=0
spark.dynamicAllocation.minExecutors=2 --> Start this value with the lower number, if not it will launch number of the minimum containers specified and will only use the required containers (memory and VCores) and rest of the memory and VCores will be marked as reserved memory and causes memory issues
spark.dynamicAllocation.maxExecutors=10
And it is always good to start with less executor memory (e.g 10/15g) and more executors (20/30) Our scenario we have observed that giving the executor memory (50/100g) and executors as (5/10) the query took 3min 48secs (228sec) --> which is obvious as the parallelism is very less and reducing the executor memory (10/15g) and increasing the executors (25/30) the same query took on 54secs. Please note the number of executors and executor memory are usecase dependent and we have done few trails before getting the optimal performance for our scenario.
... View more