Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
Labels (1)
avatar
Master Guru

Short Description:

Teragen and Terasort Performance testing on AWS

Article

This article should be used with extreme care. Do not use as benchmark. I performed this test to simply run a quick 1 Terabype teragen test on AWS to determine what type of performance I can get from mapreduce on AWS with VERY LITTLE configuration tweaking/tuning

On my github page here you will find the following:

  • teragen script
  • hadoop,yarn,mapred,capacity scheduler configurations used during testing

Hardware: (Master & Datanode)

1 Master, 3 Data nodes

d2.4xlarge, 16vCPU, 122GB ram, (max) 12x2000 Storage

TeraGen Results:

1hrs, 6mins, 38sec

Job Counters:

5664-teragen-counters.jpg

Terasort Results:

1hrs, 34mins, 20sec

5666-terasort.jpg

Teravalidate Results:

25mins, 27sec

5667-teravalidate.jpg

5,916 Views