04-18-2019 08:47 PM
I'am trying to understand the spark shuffle strategies.
Whether spark uses different shuffle strategies for spark sql and dataframes .?
From my understanding , for Catalyst (spark sql) Exchangeplan(Shuffle) always uses Hashpartitioning.Here the shuffle parttions or reducers are controlled by spark.shuffle.partitions parameter.
Incase of dataframe ie without catalyst involvement , spark uses spark.shuffle.manager (normally SORT ).
Please correct me if my understanding is wrong..