Created 10-05-2015 10:27 PM
Do we have any public-consumable documents for "Sanity Checking" a cluster? Aside from running the service checks and ensuring all services start and stop properly, are there any other tests that are run in the field to help validate and ensure the cluster is running acceptably?
Thanks!
Created 10-06-2015 12:49 PM
*** NameNode Exercise
*** login as hdfs user ***
TestDFSIO Write Test ***
# -fileSize argument is, by default, in units of MB. This should write 10 GB of files hadoop jar
/usr/hdp/current/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-*tests.jar TestDFSIO -write -nrFiles 100 -fileSize 100 TestDFSIO Read Test hadoop jar
/usr/hdp/current/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-*tests.jar TestDFSIO -read -nrFiles 100 -fileSize 100 TestDFSIO Cleanup hadoop jar
/usr/hdp/current/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-*tests.jar TestDFSIO -clean
Read/Write Data
*** TeraGen ***
hdfs dfs -mkdir /benchmarks hdfs dfs -mkdir /benchmarks/terasort
# This will generate 1,000,000 100 byte records as input for Terasort hadoop jar
/usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples.jar teragen 1000000 /benchmarks/terasort/terasort-input
*** TeraSort ***
# Sort the 1,000,000 records generated by TeraGen
hadoop jar /usr/hdp/current/hadoop-mapreduce/hadoop-mapreduce-examples.jar terasort /benchmarks/terasort/terasort-input /benchmarks/terasort/terasort-output
*** TeraValidate ***
# Validate that the sort was successful and correct
hadoop jar /usr/hdp/current/hadoop-mapreduce/hadoop-mapreduce-examples.jar teravalidate /benchmarks/terasort/terasort-output /benchmarks/terasort/teravalidate-output
Created 10-05-2015 11:32 PM
Hey Kent,
I've always been a fan of running a TestDFSio (stress spindles) like the following:
hadoop jar /usr/lib/hadoop/hadoop-*test*.jar TestDFSIO -write -nrFiles 64 -fileSize 16GB
Followed up with a Teragen/Terasort job. First create the data using Teragen then execute terasort (mapreduce job) on the generated teragen data set.
hadoop jar hadoop-*examples*.jar teragen 10000000000 /user/hduser/terasort-input
hadoop jar hadoop-*examples*.jar terasort /user/hduser/terasort-input /user/hduser/terasort-output
Good documentation on this topic located here
Created 10-06-2015 12:49 PM
*** NameNode Exercise
*** login as hdfs user ***
TestDFSIO Write Test ***
# -fileSize argument is, by default, in units of MB. This should write 10 GB of files hadoop jar
/usr/hdp/current/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-*tests.jar TestDFSIO -write -nrFiles 100 -fileSize 100 TestDFSIO Read Test hadoop jar
/usr/hdp/current/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-*tests.jar TestDFSIO -read -nrFiles 100 -fileSize 100 TestDFSIO Cleanup hadoop jar
/usr/hdp/current/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-*tests.jar TestDFSIO -clean
Read/Write Data
*** TeraGen ***
hdfs dfs -mkdir /benchmarks hdfs dfs -mkdir /benchmarks/terasort
# This will generate 1,000,000 100 byte records as input for Terasort hadoop jar
/usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples.jar teragen 1000000 /benchmarks/terasort/terasort-input
*** TeraSort ***
# Sort the 1,000,000 records generated by TeraGen
hadoop jar /usr/hdp/current/hadoop-mapreduce/hadoop-mapreduce-examples.jar terasort /benchmarks/terasort/terasort-input /benchmarks/terasort/terasort-output
*** TeraValidate ***
# Validate that the sort was successful and correct
hadoop jar /usr/hdp/current/hadoop-mapreduce/hadoop-mapreduce-examples.jar teravalidate /benchmarks/terasort/terasort-output /benchmarks/terasort/teravalidate-output
Created 02-22-2016 03:44 AM
Would like to call out that with the latest version of tests-jar, it is better that we call out the path for test result file with the -resFile parameter as shown below. Pls note that this is the path in local directory (not HDFS directory)
hadoop jar hadoop-mapreduce-client-jobclient-2.7.1.2.3.4.0-3485-tests.jar TestDFSIO -read -nrFiles 5 -fileSize 1000 -resFile /tmp/TestDFSIO_results.log
Created 05-25-2016 10:40 AM
I haven't seen a full document that covers sanity checking the entire cluster. This is often performed by the PS team at customer engagements. Side note: the most important common individual component test I use to smoke test a cluster is Hive-TestBench.