Have you enabled mapside compression to reduce the amount of data moved across the clusters when shuffling to the reduces? How long did your longest map task take to run (start time and end time)?
@Joseph Niemiec . The longest running mapper was for 2 mins 54 secs .
The compression is set to false . I guess as suggested above, it makes sense to compress the data and send compress data across the network.
Since this is a hive job , I believe below is the properties that need to be enabled .
I don't see an option to modify the compression format for the intermediate task in hive, looks like it picks up hadoop default compression . This has not been defined in our environment so i guess i will have to set this property and test it out.
Is there a way to pass on the mapreduce intermediate compression from the job instead of making a global change .
Because your running the Hive query on the MR engine the MR props will be respected. You can mess with your slowstart, heapsizes, compressions all by just setting MR props in the Hive session/job like below, dont bother setting the Hive you have above if we explicitly set the MR ones. Also can we get a screenshot of your counters page? You can get to it from the overview page on the left, I am most interested in the 'MapReduce Framework Counters'
##Setting MR Props in Hive##
Thanks , i will try setting those MR properties through hive . Below is the MR framework counters screen shot .