HI, I am working on Hadoop framework/technologies with Hive, Ambari, Ranger and planning to create a dashboard, which provides following information :
-Execution time of a Hive query.
-Size of data generated by the created table.
-Frequency of using each hive schema, table and each column of respective table.
-User/ Application name or ID firing any query
-Resources usage of each application /User.
For now, I am trying to use HiveMetastore and Ranger audit logs to access the above mentioned information. Is there any other better way to fetch the information above ? Kindly let me know, if I need to provide any more information.
The database agnostic high-level model to go over metadata is the hive meta tool.
HIVE_CONF_DIR=/etc/hive/conf/conf.server/ hive --service metatool -executeJDOQL "select name from org.apache.hadoop.hive.metastore.model.MDatabase"
HIVE_CONF_DIR=/etc/hive/conf/conf.server/ hive --service metatool -executeJDOQL "select database.name + '.' + tableName from org.apache.hadoop.hive.metastore.model.MTable"
You can find the ORM data layouts here