Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

What does the shuffling phase actually do?

What does the shuffling phase actually do?

New Contributor

What does the scheduler actually do?

A) As shuffling is the process of bringing the mapper o/p to the reducer o/p, it just brings the specific keys from the mappers to the particular reducers based on the code written in partitioner

eg the o/p of mapper 1 is {a,1} {b,1}

the o/p of mapper 2 is {a,1} {b,1}

and in my partitioner, I have written that all keys starting with 'a' will go to reducer 1 and all keys starting with 'b will go to reducer 2 so the o/p would be:

reducer 1: {a,1}{a,1}

reducer 2: {b,1}{b,1}

B) Or along with he above process, does it also groups the keys:

So, the o/p would be:

reducer 1: {a,[1,1]}

reducer 2: {b,[1,1]}

In my opinion I think it should be just A point cause groping of keys must take place after sorting because sorting is only done so that reducer can easily point out when one key is ending and the other key is starting. If yes, when does gropping of keys actually happen, please elaborate.

Thanks

Don't have an account?
Coming from Hortonworks? Activate your account here