Support Questions
Find answers, ask questions, and share your expertise

Spark YARN Configuration on HDP 2.4 Recommendations

Expert Contributor

Hi Guys,

We have successfully configured Spark on YARN using Ambari on HDP 2.4 with default parameters. However I would like to know what all parameters can we tune for best performance. Should we have separate queues for spark jobs? The use cases are yet to be decided but primarily to replace old MR jobs, experiment with Spark streaming and probably we will also use data frames. How many Spark Thrift Server instances recommended?

Cluster is 20 nodes, each with 256 GB RAM, 36 cores each. Load is generally 5% for other jobs.

Many thanks.

1 ACCEPTED SOLUTION

Please see Running Spark in Production session from Hadoop Summit, Dublin. See the section on perf tuning.

Slides, video about executor selection

View solution in original post

6 REPLIES 6

Expert Contributor

@Smart Solutions

Below is an official doc for spark tuning on YARN,

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_spark-guide/content/ch_tuning-spark.html

Generally we see people creates queues to segregate resources b/w different department groups within company or on the basis of number of applications like ETL, real time and so on. Therefore it depends on what your use case is and how you are going to share the cluster resources b/w groups/application. For Spark thrift its better to have single instance within a cluster unless you have 100's of thrift clients running and submitting jobs at same time.

Please see Running Spark in Production session from Hadoop Summit, Dublin. See the section on perf tuning.

Slides, video about executor selection

Super Guru

If you have 256 GB/node, leave out at-least 2 GB & 1 core for OS, more if there is something else running on the node. Then start with 5 cores/Executor & 30GB/Ex. So about 7 Executor/node.

Expert Contributor
; ;