Short Description:

Teragen and Terasort Performance testing on AWS


This article should be used with extreme care. Do not use as benchmark. I performed this test to simply run a quick 1 Terabype teragen test on AWS to determine what type of performance I can get from mapreduce on AWS with VERY LITTLE configuration tweaking/tuning

On my github page here you will find the following:

  • teragen script
  • hadoop,yarn,mapred,capacity scheduler configurations used during testing

Hardware: (Master & Datanode)

1 Master, 3 Data nodes

d2.4xlarge, 16vCPU, 122GB ram, (max) 12x2000 Storage

TeraGen Results:

1hrs, 6mins, 38sec

Job Counters:


Terasort Results:

1hrs, 34mins, 20sec


Teravalidate Results:

25mins, 27sec


