- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
how to configure spark submit configurations based on File size
- Labels:
-
Apache Spark
Created ‎09-17-2018 05:49 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi All,
we know there are formulas available to deteremine Spark job "Executor memory" and "number of Executor" and "executor cores" based on your cluster available Resources, is there any formula available to calculate the same alone with Data size.
case 1: what is the configuration if: data size < 5 GB
case 2: what is the configuration if: 5 GB > data size < 10 GB
case 3: what is the configuration if: 10 GB > data size < 15 GB
case 4: what is the configuration if: 15 GB > data size < 25 GB
case 5: what is the configuration if: data size < 25 GB
Cheers,
MJ
Created ‎09-17-2018 05:57 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It is hard to provide the exact values based on the data size.
However you can refer to the following article to understand Executor Memory/Core/Resource Optimization.
https://dzone.com/articles/apache-spark-on-yarn-resource-planning
