Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Cluster monitoring - Impala performance issues

Highlighted

Cluster monitoring - Impala performance issues

New Contributor

We have hosted CDH 5.16 cluster on AWS. There are many data scientists who use Impala and run bad queries most times, or a query which goes with bad planning. We have custom cluster utilization reports generated which has CPU hours and Memory TB etc for both Impala & YARN, however it doesnt give a clear picture on when service hangs, what other jobs run during the same time when cluster utilization is high etc. We are not able to figure out with the aggregation metrics collected as a particular job will run for 1 hour when there is no load, and 3-5 hours when the utilization is high. We would like to identify and offload such jobs to another cluster. Any tips/suggestions on how to collect such metrics? We can also check 3rd party tools to collect such monitoring metrics. Will look forward for suggestions from the forum. Thanks in advance.