Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Reduce Shuffle Bytes in in MR 1

Highlighted

Reduce Shuffle Bytes in in MR 1

New Contributor

We have migrated a job (and relevant data) to a new cluster.  It should be noted that the newer cluster is smaller, but the job still fits within the number of available containers.  Everything works as expected up until the reduce phase.  We have noticed this takes a lot longer on the new cluster.  In particular, the "Reduce Shuffle bytes" is greatly increased in the new cluster.  Here is a snippet from the results:

 

New Cluster:

000HADOOP:Job Counters .Data-local map tasks=160

000HADOOP:Job Counters .Launched map tasks=171

000HADOOP:Job Counters .Launched reduce tasks=37

000HADOOP:Job Counters .Rack-local map tasks=10

000HADOOP:Job Counters .Total time spent by all maps in occupied slots (ms)=60096803

000HADOOP:Job Counters .Total time spent by all maps waiting after reserving slots (ms)=0

000HADOOP:Job Counters .Total time spent by all reduces in occupied slots (ms)=101496628

000HADOOP:Job Counters .Total time spent by all reduces waiting after reserving slots (ms)=0

000HADOOP:Map-Reduce Framework.CPU time spent (ms)=201093460

000HADOOP:Map-Reduce Framework.Combine input records=0

000HADOOP:Map-Reduce Framework.Combine output records=0

000HADOOP:Map-Reduce Framework.Input split bytes=102405

000HADOOP:Map-Reduce Framework.Map input records=244111696

000HADOOP:Map-Reduce Framework.Map output bytes=152078261229

000HADOOP:Map-Reduce Framework.Map output records=244111696

000HADOOP:Map-Reduce Framework.Physical memory (bytes) snapshot=513,454,596,096

000HADOOP:Map-Reduce Framework.Reduce input groups=244111696

000HADOOP:Map-Reduce Framework.Reduce input records=244111696

000HADOOP:Map-Reduce Framework.Reduce output records=244111696

000HADOOP:Map-Reduce Framework.Reduce shuffle bytes=153,054,745,975

000HADOOP:Map-Reduce Framework.Spilled Records=488223392

000HADOOP:Map-Reduce Framework.Total committed heap usage (bytes)=626664927232

000HADOOP:Map-Reduce Framework.Virtual memory (bytes) snapshot=1034891923456

000HADOOP:MultiInputCounters.Input records from _1_output.dat=244111696

 

 

Prod Cluster:

000HADOOP:Job Counters .Data-local map tasks=150

000HADOOP:Job Counters .Launched map tasks=171

000HADOOP:Job Counters .Launched reduce tasks=37

000HADOOP:Job Counters .Rack-local map tasks=20

000HADOOP:Job Counters .Total time spent by all maps in occupied slots (ms)=62483634

000HADOOP:Job Counters .Total time spent by all maps waiting after reserving slots (ms)=0

000HADOOP:Job Counters .Total time spent by all reduces in occupied slots (ms)=55419788

000HADOOP:Job Counters .Total time spent by all reduces waiting after reserving slots (ms)=0

000HADOOP:Map-Reduce Framework.CPU time spent (ms)=147935890

000HADOOP:Map-Reduce Framework.Combine input records=0

000HADOOP:Map-Reduce Framework.Combine output records=0

000HADOOP:Map-Reduce Framework.Input split bytes=104451

000HADOOP:Map-Reduce Framework.Map input records=244111696

000HADOOP:Map-Reduce Framework.Map output bytes=152078261229

000HADOOP:Map-Reduce Framework.Map output records=244111696

000HADOOP:Map-Reduce Framework.Physical memory (bytes) snapshot=563,320,909,824

000HADOOP:Map-Reduce Framework.Reduce input groups=244111696

000HADOOP:Map-Reduce Framework.Reduce input records=244111696

000HADOOP:Map-Reduce Framework.Reduce output records=244111696

000HADOOP:Map-Reduce Framework.Reduce shuffle bytes=50,991,891,912

000HADOOP:Map-Reduce Framework.Spilled Records=731615420

000HADOOP:Map-Reduce Framework.Total committed heap usage (bytes)=557672038400

000HADOOP:Map-Reduce Framework.Virtual memory (bytes) snapshot=1070805831680

000HADOOP:MultiInputCounters.Input records from _1_output.dat=244111696

 

 

We have synced a few settings between the clusters such as:

 

io.sort.factor =100

io.sort.record.percent=0.05

io.sort.mb =1400 Mb

io.sort.spill.percent= 0.8

 

Any help on where to investigate would be greatly appreciated.

1 REPLY 1
Highlighted

Re: Reduce Shuffle Bytes in in MR 1

Contributor

Hi JimAcxiom,

 

Some interesting details about the counters provided:
  • The reduce shuffle bytes difference is roughly 102GB between the two clusters, which yielded an 885 min difference as well (Newer cluster taking much longer)
  • The number of spilled records was much higher on the old cluster around 243 million records were spilled more on the old cluster
  • Theres a different of about 50GB for physical memory snapshot bytes, consumed more by the old (prod) cluster (which includes the spills)
  • The old cluster used 36GB more virtual memory
 
DescriptionDiffOld  (prod) clusterNew Cluster
Launched Map Tasks 171171
Launched Red Tasks 3737
Data local map tasks10150160
Rack local map tasks102010
Time spent all maps in occupied slots (ms)39 mins.6248363460096803
Time spent all reds in occupied slots (ms)767 mins55419788101496628
MR Framework CPU time spent (ms)885 mins147935890201093460
MR Framework Input Split bytes2 KB104451102405
MR Framework Phy memory snapshot bytes49 GB563,320,909,824513,454,596,096
MR Framework Red Shuffle Bytes102 GB50,991,891,912153,054,745,975
MR Framework Spilled Records243392028731615420488223392
MR Framework Total committed heap usage69 GB557672038400626664927232
MR Framework Virt Mem bytes snapshot36 GB10708058316801034891923456

Given that the job run on the prod cluster consumed more physical memory and spilled much more records, it may be worth looking to ruling out hardware and existing settings as part of the culprit.
 
Have you had a chance to review (or would you be able to provide some addiitonal details) of both of the clusters, in particular:
 
  • The hardware differences (Number of HDD spindles, RAM, CPU type and number cores) between both cluster's worker nodes
  • The various compression properties used in both clusters
  • MR-specific properties, such as: 
    • mapred.reduce.slowstart.completed.maps
    • mapred.tasktracker.map.tasks.maximum
    • mapred.tasktracker.reduce.tasks.maximum
    • mapred.map.child.ulimit
    • The job.xmls of both jobs for comparison

 

Kind Regards,

 

Anthony

 

Don't have an account?
Coming from Hortonworks? Activate your account here