Support Questions

manikandanjeyab · ‎09-17-2018

Hi All,

we know there are formulas available to deteremine Spark job "Executor memory" and "number of Executor" and "executor cores" based on your cluster available Resources, is there any formula available to calculate the same alone with Data size.

case 1: what is the configuration if: data size < 5 GB

case 2: what is the configuration if: 5 GB > data size < 10 GB

case 3: what is the configuration if: 10 GB > data size < 15 GB

case 4: what is the configuration if: 15 GB > data size < 25 GB

case 5: what is the configuration if: data size < 25 GB

Cheers,

MJ

jsensharma · ‎09-17-2018

@Manikandan Jeyabal

It is hard to provide the exact values based on the data size.

However you can refer to the following article to understand Executor Memory/Core/Resource Optimization.

https://community.hortonworks.com/articles/42803/spark-on-yarn-executor-resource-allocation-optimiz....

https://dzone.com/articles/apache-spark-on-yarn-resource-planning

Cloudera Community

Support Questions

how to configure spark submit configurations based on File size