I am thinking of a way to assess Hive queries before they are submitted based on some anticipated process time. So assuming statistics are gathered regularly, cbo is enabled etc. what would be a good way to summarize all the info 'explain select.... ' spits out into 1 KPI.
The eventual aim is to assign high costs queries into a separate queue (mapreduce) and low cost to Tez.
You can look at the Hive hooks. I.e. you can write Java classes which are called before/after/on failure according to how you configure the proper variables in Hive configuration. Here you can find some slides about them: http://www.slideshare.net/julingks/apache-hive-hooksminwookim130813.
Here you can also find an article with an example about how to change the queue a query is submitted to: https://community.hortonworks.com/articles/24009/map-hive-jobs-to-yarn-queues.html.