Current process: Click a page > Wait > Copy table to Excel > Repeat (tedious!)
I use the information for some summary metrics about usage of our database and error checking on batch queries.
Note: Since Tez UI Ambari view is going away in 2.7 (reference link), I would REALLY like to find a way to get at this information programatically.
<Screenshot of Tez Hive Queries screen attached>
Tez actually ships with Pig loader to mine Tez logs, you can find the details of it at https://github.com/apache/tez/tree/master/tez-tools/tez-tfile-parser
Here's a sample
set pig.splitCombination false; set tez.grouping.min-size 52428800; set tez.grouping.max-size 52428800; /* Register all tez jars. Replace $TEZ_HOME, $TEZ_TFILE_DIR with absolute path */ register '$TEZ_HOME/*.jar'; register '$TEZ_TFILE_DIR/tfile-parser-1.0-SNAPSHOT.jar'; raw = load '/app-logs/root/logs/application_1411511669099_0769/*' using org.apache.tez.tools.TFileLoader() as (machine:chararray, key:chararray, line:chararray); filterByLine = FILTER raw BY (key MATCHES '.*container_1411511669099_0769_01_000001.*') AND (line MATCHES '.*Shuffle.*'); dump filterByLine;
Thank you both. This makes me very hopeful.
May I ask for one more layer of context (since working with APIs and Pig scripts is new to me)?
Something along the lines of...
Thanks for your patience with a neophyte. :)