Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Spark 2.2.0 Scaling issues

Highlighted

Spark 2.2.0 Scaling issues

Contributor

Hi

We are running decision tree models in spark+R studio sparklyr and and the model works upto a certain # of decision trees. When we increase the # decision trees spark is not scaling and its failing with various issues. At 1 stage its asking for 164 tasks and spark context is never getting 164 tasks and its just struck there. We are running spark in Stand Alone mode on docker containers. It has 14 workers and 1 master. Below are the spark properties we are using. Please suggest any tuning properties which will help us. The Spark stand alone cluster has available memory. Thanks

spark.cores.max=82
spark.driver.memory=10g
spark.driver.maxResultSize=10g
spark.executor.memory=20g
spark.executor.cores=2
spark.network.timeout=800s
spark.rpc.askTimeout=800s
spark.dynamicAllocation.enabled=true
spark.shuffle.service.enabled=true
spark.dynamicAllocation.minExecutors=10
spark.dynamicAllocation.maxExecutors=500
spark.driver.extraJavaOptions=-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
spark.reducer.maxSizeInFlight=1024m
spark.shuffle.file.buffer=256k
spark.locality.wait=15s
spark.files.maxPartitionBytes=268435456
spark.sql.shuffle.partitions=1000
spark.default.parallelism=1000
spark.broadcast.compress=true
spark.io.compression.codec=lz4
spark.rdd.compress=true
spark.shuffle.compress=true
spark.shuffle.spill.compress=true
spark.memory.fraction=0.8
spark.memory.storageFraction=0.4
spark.scheduler.minRegisteredResourcesRatio=0.0
spark.scheduler.maxRegisteredResourcesWaitingTime=30s
spark.task.maxFailures=5
spark.shuffle.io.maxRetries=3
spark.shuffle.io.preferDirectBufs=true

spark.shuffle.io.retryWait=5s