Do we have any public-consumable documents for "Sanity Checking" a cluster? Aside from running the service checks and ensuring all services start and stop properly, are there any other tests that are run in the field to help validate and ensure the cluster is running acceptably?
Would like to call out that with the latest version of tests-jar, it is better that we call out the path for test result file with the -resFile parameter as shown below. Pls note that this is the path in local directory (not HDFS directory)
I haven't seen a full document that covers sanity checking the entire cluster. This is often performed by the PS team at customer engagements. Side note: the most important common individual component test I use to smoke test a cluster is Hive-TestBench.