Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Need to understand why Job taking long time in reduce phase, approximately 2 hrs for one reduce task

Highlighted

Re: Need to understand why Job taking long time in reduce phase, approximately 2 hrs for one reduce task

Contributor

Have you enabled mapside compression to reduce the amount of data moved across the clusters when shuffling to the reduces? How long did your longest map task take to run (start time and end time)?

set mapreduce.map.output.compress=true;

set mapreduce.map.output.compress.codec=org.apache.hadoop.io.compress.SnappyCodec;

Highlighted

Re: Need to understand why Job taking long time in reduce phase, approximately 2 hrs for one reduce task

Highlighted

Re: Need to understand why Job taking long time in reduce phase, approximately 2 hrs for one reduce task

Rising Star

@Joseph Niemiec . The longest running mapper was for 2 mins 54 secs .

The compression is set to false . I guess as suggested above, it makes sense to compress the data and send compress data across the network.

Since this is a hive job , I believe below is the properties that need to be enabled .

hive.exec.compress.intermediate=true

I don't see an option to modify the compression format for the intermediate task in hive, looks like it picks up hadoop default compression . This has not been defined in our environment so i guess i will have to set this property and test it out.

Is there a way to pass on the mapreduce intermediate compression from the job instead of making a global change .

Re: Need to understand why Job taking long time in reduce phase, approximately 2 hrs for one reduce task

Contributor

Because your running the Hive query on the MR engine the MR props will be respected. You can mess with your slowstart, heapsizes, compressions all by just setting MR props in the Hive session/job like below, dont bother setting the Hive you have above if we explicitly set the MR ones. Also can we get a screenshot of your counters page? You can get to it from the overview page on the left, I am most interested in the 'MapReduce Framework Counters'

##Setting MR Props in Hive##

set mapreduce.map.output.compress=true;

set mapreduce.map.output.compress.codec=org.apache.hadoop.io.compress.SnappyCodec;

Highlighted

Re: Need to understand why Job taking long time in reduce phase, approximately 2 hrs for one reduce task

Rising Star

Thanks , i will try setting those MR properties through hive . Below is the MR framework counters screen shot .

1575-mapreduce-framework-counters.png

Highlighted

Re: Need to understand why Job taking long time in reduce phase, approximately 2 hrs for one reduce task

Rising Star

@Neeraj Sabharwal this best practice document is really helpful thank you .

Highlighted

Re: Need to understand why Job taking long time in reduce phase, approximately 2 hrs for one reduce task

New Contributor

@Jagdish Saripella, @Joseph Niemiec

Have you figured out the problem ? I ran into similar issue. This is unlikely a data-skew issue.

Don't have an account?
Coming from Hortonworks? Activate your account here