Hello All, In Hadoop MapReduce, By default, the number of mappers created is depends on number of input splits. For example, if 192 MB is your inpur file size and 1 block is of 64 MB then number of input splits will be 3. So number of mappers will be 3. The same way, I would like to know that, In spark, if i submit an application in standalone cluster(a sort of pseudo distributed) to process 750 MB input data, how many executors will be created in Spark?
If you are running on cluster mode, you need to set the number of executors while submitting the JAR or you can manually enter it in the code. The former way is better
spark-submit \ --master yarn-cluster \ --class com.yourCompany.code \ --executor-memory 32G \ --num-executors 5 \ --driver-memory 4g \ --executor-cores 3 \ --queue parsons \ YourJARfile.jar \
If running locally,
spark-shell --master yarn --num-executors 6 --driver-memory 5g --executor-memory 7g
@Saravanan Selvam, In yarn mode you can control the total number of executors needed for an application with --num-executor option.
However, if you do not explicitly specify --num-executor for spark application in yarn mode, it would typically start one executor on each Nodemanager.
Spark also has a feature called Dynamic resource allocation. It gives spark application a feature to dynamically scale the set of cluster resources allocated to your application up and down based on the workload. This way you can make sure that application is not over utilizing the resources.