Can someone help regarding cloudera charts. I didn't find any good explanation regarding charts available in cloudera manager. How i can correlate those errors with charts. I.e. cpu , io related charts.
How i can map this information to Charts. i..e. Hive canary chart. How i can map if the charts are associated with an issue. Any detailed information related to charts is required by me. How to corelate an issue with charts.
If you don't mind, could you please share us which charts you would like to understand and also the errors which you would like to correlate with charts so that we will get a chance to help you in understanding ClouderaManager-->charts
This is a Hive Metastore health test that checks that a client can connect and perform basic operations. The operations include: (1) creating a database, (2) creating a table within that database with several types of columns and two partition keys, (3) creating a number of partitions, and (4) dropping both the table and the database. The database is created under the /user/hue/.cloudera_manager_hive_metastore_canary/<Hive Metastore role name>/ and is named "cloudera_manager_metastore_canary_test_db". The test returns "Bad" health if any of these operations fail. The test returns "Concerning" health if an unknown failure happens. The canary publishes a metric 'canary_duration' for the time it took for the canary to complete. Here is an example of a trigger, defined for the Hive Metastore role configuration group, that changes the health to "Bad" when the duration of the canary is longer than 5 sec: "IF (SELECT canary_duration WHERE entityName=$ROLENAME AND category = ROLE and last(canary_duration) > 5s) DO health:bad" A failure of this health test may indicate that the Hive Metastore is failing basic operations. Check the logs of the Hive Metastore and the Cloudera Manager Service Monitor for more details. This test can be enabled or disabled using the Hive Metastore Canary Health Test Hive Metastore monitoring setting.
Check how much JVM Heap had been used for the service (HS2 etc..) for which you received this Alert.
From the alert, you can see that JVM pause takes 2+min and you have configured to alert if GC pause takes 60% of 1min. You should see the JVM Heap Memory Usage and GC pause charts in the Service(for which you see this alert) and check If the heap is constantly high then that is the likely reason. In that case, the solution could be a simple as increasing the heap size.