Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Can any one please provide best memory settings for spark job, like no of executors,no of cores,executor-memory.. in hortonworks cluster for below configuration.

Can any one please provide best memory settings for spark job, like no of executors,no of cores,executor-memory.. in hortonworks cluster for below configuration.

New Contributor

No of Nodes - 3

Ram - 256GB/N

Cores -56/N

disk - 60TB/N

Please let me know if you need other parameters?.

Thanks,

Chandra

2 REPLIES 2

Re: Can any one please provide best memory settings for spark job, like no of executors,no of cores,executor-memory.. in hortonworks cluster for below configuration.

New Contributor

It really depends on your job. The default is usually 3 executors (which maps to three containers). Essentially you would have need to be able to fit your working data in memory. Example, if your table is 300 GB a then you would want to be able to fit that table in memory for optimal processing. If you are doing CPU intensive work then do something like 30 containers at 10 GB with 2 vCores. I would suggest you play around and optimize on per job basis

Re: Can any one please provide best memory settings for spark job, like no of executors,no of cores,executor-memory.. in hortonworks cluster for below configuration.

Super Collaborator

Hi @Chandra Sekhar,

There is no rule of thumb or common practice how much resources a job should be given, that's purely relies on the job type and resource availability,

but the principle is that you need to look into job execution plan carefully and allocate the resources depends on the scenarios mentioned below,

Number of executors :

If there are any parallel steps waiting for execution though the predecessor completes the task, then you must consider increasing the executors, so that all nondependent steps can be executed in parallel.

Number of Cores per executor :

There are cases the compute time of execution is longer, then you need to increase the executor cores.

Executor Memory :

in case ,if you find any of the GC errors and task is being resubmitted(you can find some of the tasks being skip for previous run) then you need to look into the executor memory and increase it.

again please be mindful that there are cases if you don't un persist your RDD's or Data frames you might come across increase in GC time.

the best practice is to look into your execution plan carefully and optimize the resource allocation.

you might find the following benchmark presentation helpful to understand more of this.

https://www.slideshare.net/HadoopSummit/running-spark-in-production-61337353