Reply
Highlighted
Dil
New Contributor
Posts: 1
Registered: ‎04-18-2019

Spark shuffle strategies for spark sql and dataframe

Hello ,

I'am trying to understand the spark shuffle strategies.
Whether spark uses different shuffle strategies for spark sql and dataframes .?

From my understanding , for Catalyst (spark sql) Exchangeplan(Shuffle) always uses Hashpartitioning.Here the shuffle parttions or reducers are controlled by spark.shuffle.partitions parameter.

Incase of dataframe ie without catalyst involvement , spark uses spark.shuffle.manager (normally SORT ).

Please correct me if my understanding is wrong..

 

Regards