- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Spark YARN Configuration on HDP 2.4 Recommendations
- Labels:
-
Apache Spark
-
Apache YARN
Created ‎05-18-2016 06:42 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Guys,
We have successfully configured Spark on YARN using Ambari on HDP 2.4 with default parameters. However I would like to know what all parameters can we tune for best performance. Should we have separate queues for spark jobs? The use cases are yet to be decided but primarily to replace old MR jobs, experiment with Spark streaming and probably we will also use data frames. How many Spark Thrift Server instances recommended?
Cluster is 20 nodes, each with 256 GB RAM, 36 cores each. Load is generally 5% for other jobs.
Many thanks.
Created ‎05-18-2016 09:54 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Please see Running Spark in Production session from Hadoop Summit, Dublin. See the section on perf tuning.
Created ‎05-18-2016 06:44 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Created ‎05-18-2016 08:06 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Below is an official doc for spark tuning on YARN,
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_spark-guide/content/ch_tuning-spark.html
Generally we see people creates queues to segregate resources b/w different department groups within company or on the basis of number of applications like ETL, real time and so on. Therefore it depends on what your use case is and how you are going to share the cluster resources b/w groups/application. For Spark thrift its better to have single instance within a cluster unless you have 100's of thrift clients running and submitting jobs at same time.
Created ‎05-18-2016 09:54 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Created ‎05-19-2016 02:13 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Created ‎05-19-2016 04:40 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you have 256 GB/node, leave out at-least 2 GB & 1 core for OS, more if there is something else running on the node. Then start with 5 cores/Executor & 30GB/Ex. So about 7 Executor/node.
Created ‎05-19-2016 11:37 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks @vshukla, @Timothy Spann, @Jitendra Yadav, @Yuta Imai
