Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Impala/YARN : Cluster overloading on AWS

Impala/YARN : Cluster overloading on AWS

New Contributor

We have hosted CDH 5.16 cluster on AWS. There are many data scientists who use Impala and run bad queries most times, or a query which goes with bad planning. We have custom cluster utilization reports generated which has CPU hours and Memory TB etc for both Impala & YARN, however it doesnt give a clear picture on when service hangs, what other jobs run during the same time when cluster utilization is high etc. We are not able to figure out with the aggregation metrics collected as a particular job will run for 1 hour when there is no load, and 3-5 hours when the utilization is high. We would like to identify and offload such jobs to another cluster. Any tips/suggestions on how to collect such metrics? We can also check 3rd party tools to collect such metrics. Will look forward for suggestions from the forum. Thanks in advance.

1 REPLY 1

Re: Impala/YARN : Cluster overloading on AWS

Master Collaborator

Not sure if this is an option for you, but WorkloadXM is designed to make visualising and analysing such problems much easier - https://www.cloudera.com/products/workload-xm.html