Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

how to configure spark submit configurations based on File size

Highlighted

how to configure spark submit configurations based on File size

Contributor

Hi All,

we know there are formulas available to deteremine Spark job "Executor memory" and "number of Executor" and "executor cores" based on your cluster available Resources, is there any formula available to calculate the same alone with Data size.

case 1: what is the configuration if: data size < 5 GB

case 2: what is the configuration if: 5 GB > data size < 10 GB

case 3: what is the configuration if: 10 GB > data size < 15 GB

case 4: what is the configuration if: 15 GB > data size < 25 GB

case 5: what is the configuration if: data size < 25 GB

Cheers,

MJ

1 REPLY 1

Re: how to configure spark submit configurations based on File size

Super Mentor

@Manikandan Jeyabal

It is hard to provide the exact values based on the data size.

However you can refer to the following article to understand Executor Memory/Core/Resource Optimization.

https://community.hortonworks.com/articles/42803/spark-on-yarn-executor-resource-allocation-optimiz....

https://dzone.com/articles/apache-spark-on-yarn-resource-planning