Support Questions

Find answers, ask questions, and share your expertise

how to configure spark submit configurations based on File size

avatar
Contributor

Hi All,

we know there are formulas available to deteremine Spark job "Executor memory" and "number of Executor" and "executor cores" based on your cluster available Resources, is there any formula available to calculate the same alone with Data size.

case 1: what is the configuration if: data size < 5 GB

case 2: what is the configuration if: 5 GB > data size < 10 GB

case 3: what is the configuration if: 10 GB > data size < 15 GB

case 4: what is the configuration if: 15 GB > data size < 25 GB

case 5: what is the configuration if: data size < 25 GB

Cheers,

MJ

1 REPLY 1

avatar
Master Mentor

@Manikandan Jeyabal

It is hard to provide the exact values based on the data size.

However you can refer to the following article to understand Executor Memory/Core/Resource Optimization.

https://community.hortonworks.com/articles/42803/spark-on-yarn-executor-resource-allocation-optimiz....

https://dzone.com/articles/apache-spark-on-yarn-resource-planning