Member since
07-04-2018
2
Posts
1
Kudos Received
0
Solutions
02-23-2019
06:00 PM
for example, in one of my DAG, all that those task do is Sort WithinPartition (so no shuffle) still it spills data on disk because partition size is huge and spark resort to ExternalMergeSort. As a result, I have a high Shuffle Spill (memor) and also some Shuffle Spill(Disk). There is no shuffle here. but on the other hand you can argue that Sorting process moves data in order to sort so it's kind of internal shuffle 🙂
... View more