Community Articles
Find and share helpful community-sourced technical articles
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.
Labels (1)
Explorer

SYMPTOM :

Hive query with group by clause stuck in reducer phase for a very long time having large amount of data

ROOT CAUSE:

This happens in the case when GROUPBY clause is not optimized. By default Hive puts the data with the same group-by keys to the same reducer. If the distinct value of the group-by columns has data skew, one reducer may get most of the shuffled data and will be stuck for a very long time on this reducer.

WORKAROUND:

In this case increasing the tez container memory will not help. We can avoid data skewness using the following properties before running the query,

>set hive.tez.auto.reducer.parallelism=true >set hive.groupby.skewindata=true ; >set hive.optimize.skewjoin=true;

3,099 Views
Don't have an account?
Coming from Hortonworks? Activate your account here
Version history
Revision #:
1 of 1
Last update:
‎06-30-2017 09:30 PM
Updated by:
 
Contributors
Top Kudoed Authors