- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Number of intermediate files with Sort shuffle in Spark
- Labels:
-
Apache Spark
-
HDFS
Created on ‎07-18-2015 10:35 AM - edited ‎09-16-2022 02:34 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi everyone!
i trying to understand Sort shuffle in spark and will very appreciate if someone could answer on simple question, let's imagine:
1) i have 600 partitions (HDFS blocks, for simplicity)
2) it place in 6 node cluster
3) i run spark with follow parameters:
--executor-memory 13G --executor-cores 6 --num-executors 12 --driver-memory 1G --properties-file my-config.conf
that's mean that on each server i will have 2 executor with 6 core each.
4) according my config reduce phase has only 3 reducers.
so, ny question is how many files on each servers will be after Sort Shuffle:
- 12 like a active map task
- 2 like a number of executors on each server
- 100 like a number of partitions that place on this server (for simplicity i just devide 600 on 6)
and the second question is how names buffer for storing intermediate data before spill it on disk on the map stage?
thanks!
Created ‎08-10-2015 05:07 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Created ‎08-10-2015 05:07 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
