Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

One reducer never completes

Highlighted

One reducer never completes

Expert Contributor

We are seeing some issues with or MR job and one reducer never completes. it keeps running for 8-9 hrs.  we initially tried increasing the reducer memory and increased number of reducers from 40 to 60. 

 

we noticed that Reduce Output Records are increasing at a slow pace. noting is being written to disk. 

 

Out of the 60 reducers reducer # 32 is taking longer. the rest of the reducers complete in around 1 min. we suspect its a data skew issue. How can i further troubleshoot this issue ? or has anyone seen this in the past ? 

 

1 REPLY 1

Re: One reducer never completes

Master Guru
Given that the exact same partition always takes higher time, it does
appear to be a data skew (lot of data with the same key, or an improvable
key choice), or an incorrect hash technique for your keys.

Overall if you find your other reducers have processed 0 or a much smaller
number of input groups/records compared to the longer running one, you can
chalk it down to the above.