Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Did You Know ... About Secondary Sorts?

Did You Know ... About Secondary Sorts?

Cloudera Employee

Secondary sorts are a way to group data together in a reduce.  If you're finding you're having to buffer data in your reducer like in this example, you should be using a secondary sort.  Buffering data when you're dealing with Big Data is a recipe for an OutOfMemoryException.  Here's a full example showing a secondary sort on playing cards.

Don't have an account?
Coming from Hortonworks? Activate your account here