Support Questions

Find answers, ask questions, and share your expertise

Sanity Check / Cluster Validation documents?

avatar
Contributor

Do we have any public-consumable documents for "Sanity Checking" a cluster? Aside from running the service checks and ensuring all services start and stop properly, are there any other tests that are run in the field to help validate and ensure the cluster is running acceptably?

Thanks!

1 ACCEPTED SOLUTION

avatar
Master Mentor

*** NameNode Exercise

*** login as hdfs user ***

TestDFSIO Write Test ***

# -fileSize argument is, by default, in units of MB. This should write 10 GB of files hadoop jar

/usr/hdp/current/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-*tests.jar TestDFSIO -write -nrFiles 100 -fileSize 100 TestDFSIO Read Test hadoop jar

/usr/hdp/current/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-*tests.jar TestDFSIO -read -nrFiles 100 -fileSize 100 TestDFSIO Cleanup hadoop jar

/usr/hdp/current/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-*tests.jar TestDFSIO -clean

Read/Write Data

*** TeraGen ***

hdfs dfs -mkdir /benchmarks hdfs dfs -mkdir /benchmarks/terasort

# This will generate 1,000,000 100 byte records as input for Terasort hadoop jar

/usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples.jar teragen 1000000 /benchmarks/terasort/terasort-input

*** TeraSort ***

# Sort the 1,000,000 records generated by TeraGen

hadoop jar /usr/hdp/current/hadoop-mapreduce/hadoop-mapreduce-examples.jar terasort /benchmarks/terasort/terasort-input /benchmarks/terasort/terasort-output

*** TeraValidate ***

# Validate that the sort was successful and correct

hadoop jar /usr/hdp/current/hadoop-mapreduce/hadoop-mapreduce-examples.jar teravalidate /benchmarks/terasort/terasort-output /benchmarks/terasort/teravalidate-output

View solution in original post

4 REPLIES 4

avatar
Expert Contributor

Hey Kent,

I've always been a fan of running a TestDFSio (stress spindles) like the following:

hadoop jar /usr/lib/hadoop/hadoop-*test*.jar TestDFSIO -write -nrFiles 64 -fileSize 16GB

Followed up with a Teragen/Terasort job. First create the data using Teragen then execute terasort (mapreduce job) on the generated teragen data set.

hadoop jar hadoop-*examples*.jar teragen 10000000000 /user/hduser/terasort-input

hadoop jar hadoop-*examples*.jar terasort /user/hduser/terasort-input /user/hduser/terasort-output

Good documentation on this topic located here

http://www.michael-noll.com/blog/2011/04/09/benchmarking-and-stress-testing-an-hadoop-cluster-with-t...

avatar
Master Mentor

*** NameNode Exercise

*** login as hdfs user ***

TestDFSIO Write Test ***

# -fileSize argument is, by default, in units of MB. This should write 10 GB of files hadoop jar

/usr/hdp/current/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-*tests.jar TestDFSIO -write -nrFiles 100 -fileSize 100 TestDFSIO Read Test hadoop jar

/usr/hdp/current/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-*tests.jar TestDFSIO -read -nrFiles 100 -fileSize 100 TestDFSIO Cleanup hadoop jar

/usr/hdp/current/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-*tests.jar TestDFSIO -clean

Read/Write Data

*** TeraGen ***

hdfs dfs -mkdir /benchmarks hdfs dfs -mkdir /benchmarks/terasort

# This will generate 1,000,000 100 byte records as input for Terasort hadoop jar

/usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples.jar teragen 1000000 /benchmarks/terasort/terasort-input

*** TeraSort ***

# Sort the 1,000,000 records generated by TeraGen

hadoop jar /usr/hdp/current/hadoop-mapreduce/hadoop-mapreduce-examples.jar terasort /benchmarks/terasort/terasort-input /benchmarks/terasort/terasort-output

*** TeraValidate ***

# Validate that the sort was successful and correct

hadoop jar /usr/hdp/current/hadoop-mapreduce/hadoop-mapreduce-examples.jar teravalidate /benchmarks/terasort/terasort-output /benchmarks/terasort/teravalidate-output

avatar
Super Collaborator

Would like to call out that with the latest version of tests-jar, it is better that we call out the path for test result file with the -resFile parameter as shown below. Pls note that this is the path in local directory (not HDFS directory)

hadoop jar hadoop-mapreduce-client-jobclient-2.7.1.2.3.4.0-3485-tests.jar TestDFSIO -read -nrFiles 5 -fileSize 1000 -resFile /tmp/TestDFSIO_results.log

avatar
Expert Contributor

I haven't seen a full document that covers sanity checking the entire cluster. This is often performed by the PS team at customer engagements. Side note: the most important common individual component test I use to smoke test a cluster is Hive-TestBench.