In our production with impersonation disabled, we have at least 5000 queries run on a daily basic. And I suspect a few queries which are part of batch jobs (1000+) are eating up a lot of cluster resources, possibly because it is written poorly. How do I find out those queries which are possible 'resource hungry'?
Hi @Smart Solutions do you use Ambari to manage your cluster? If so, you can use Grafana to visualise the usage of your cluster and perhaps see some applications which are taking up resources.
If you have SmartSense 1.3 or later installed, you can also use the Activity Analyzer to check out a lot of similar information.
Otherwise, it's a little bit difficult if you don't have any monitoring on the cluster. Do you perhaps have YARN queues set up? It may be worth tweaking them in order to prevent batch jobs from taking up so much of the cluster.