11-30-2017 03:05 PM - last edited on 12-01-2017 05:32 AM by cjervis
We need to be able to report back on cluster usage. I can use the existing reports in CM to figure out the HDFS (or storage) usage for our projects and customers. However, I also need a way to calculate and report on the compute resources used per project or customer/tenant.
Does anyone know how to do this? I have heard it is possible, but I have searched and haven't found anything.
Thanks for any help or suggestions
12-01-2017 12:43 AM
For Mapreduce and Spark jobs (running on YARN) you should be able to report from Resource Pools.
There is a chart of Per Pool allocation, containers running etc.
Of course Impala usage is not included, there you have to report from Impala Queries Workload summary and manually report from the history of queries - you have to choose a KPI, whether it is cpu time, or HDFS scanned or something else.
12-01-2017 07:52 AM
Thanks for the reply and information.
We do have Resource Pools set up now...but more by function. However...from what you are saying (and from the few things I have seen), the only way for us to get what we want is to create resource pools per project/customer.
I forgot about the Impala stuff....about them not being included. Thanks for mentioning that too.