I'm wondering if the community has put together a tool to evaluate the use of a given table in a cluster.
I'm thinking it would be something like an Ambari View that would allow me to select a table and see statistics like:
- How often is it queried (per day, per week, etc)
- Which user(s) use it most
- What is the average size (in GB of data scanned or something) of a query against that table
- What is the size of the data in the table on disk
Most of this can be pieced together from the Tez View in Ambari, so it seems like it would just be similar types of queries against the same metadata source(s).
Does anything like this exist?
With smartsense, you can install activity analyzer and activity explorer that can parse all the job runs within the cluster. This does not provide the metrics at table lever rather it can provide you information at job level. You will have to make few modifications to extract table information from actual query and join it with smartsense metrics to capture your metrics.