- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
How many number of executors will be created for a spark application?
- Labels:
-
Apache Spark
Created ‎04-04-2017 09:13 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello All, In Hadoop MapReduce, By default, the number of mappers created is depends on number of input splits. For example, if 192 MB is your inpur file size and 1 block is of 64 MB then number of input splits will be 3. So number of mappers will be 3. The same way, I would like to know that, In spark, if i submit an application in standalone cluster(a sort of pseudo distributed) to process 750 MB input data, how many executors will be created in Spark?
Created ‎04-04-2017 03:49 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you are running on cluster mode, you need to set the number of executors while submitting the JAR or you can manually enter it in the code. The former way is better
spark-submit \ --master yarn-cluster \ --class com.yourCompany.code \ --executor-memory 32G \ --num-executors 5 \ --driver-memory 4g \ --executor-cores 3 \ --queue parsons \ YourJARfile.jar \
If running locally,
spark-shell --master yarn --num-executors 6 --driver-memory 5g --executor-memory 7g
Created ‎04-04-2017 06:53 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Saravanan Selvam, In yarn mode you can control the total number of executors needed for an application with --num-executor option.
However, if you do not explicitly specify --num-executor for spark application in yarn mode, it would typically start one executor on each Nodemanager.
Spark also has a feature called Dynamic resource allocation. It gives spark application a feature to dynamically scale the set of cluster resources allocated to your application up and down based on the workload. This way you can make sure that application is not over utilizing the resources.
http://spark.apache.org/docs/1.2.0/job-scheduling.html#dynamic-resource-allocation
