Support Questions

Find answers, ask questions, and share your expertise

Understanding cloudera charts

avatar
Contributor
Can someone help regarding cloudera charts. I didn't find any good explanation regarding charts available in cloudera manager. How i can correlate those errors with charts. I.e. cpu , io related charts.
7 REPLIES 7

avatar
Expert Contributor

avatar
Contributor

Hello manuroman,

How i can map this information to Charts. i..e.  Hive canary chart. How i can map if the charts are associated with an issue. Any detailed information related to charts is required by me. How to corelate an issue with charts.

avatar
Rising Star

Hi Kamal,

 

If you don't mind, could you please share us which charts you would like to understand and also the errors which you would like to correlate with charts so that we will get a chance to help you in understanding ClouderaManager-->charts

 

Thanks,

Senthil Kumar

avatar
Contributor
i.e. Hive Metastore Canary Duration.

avatar
Rising Star
This is a Hive Metastore health test that checks that a client can connect and perform basic operations. The operations include: (1) creating a database, (2) creating a table within that database with several types of columns and two partition keys, (3) creating a number of partitions, and (4) dropping both the table and the database. The database is created under the /user/hue/.cloudera_manager_hive_metastore_canary/<Hive Metastore role name>/ and is named "cloudera_manager_metastore_canary_test_db". The test returns "Bad" health if any of these operations fail. The test returns "Concerning" health if an unknown failure happens. The canary publishes a metric 'canary_duration' for the time it took for the canary to complete. Here is an example of a trigger, defined for the Hive Metastore role configuration group, that changes the health to "Bad" when the duration of the canary is longer than 5 sec: "IF (SELECT canary_duration WHERE entityName=$ROLENAME AND category = ROLE and last(canary_duration) > 5s) DO health:bad" A failure of this health test may indicate that the Hive Metastore is failing basic operations. Check the logs of the Hive Metastore and the Cloudera Manager Service Monitor for more details. This test can be enabled or disabled using the Hive Metastore Canary Health Test Hive Metastore monitoring setting.

Ref: https://www.cloudera.com/documentation/enterprise/5-7-x/topics/cm_ht_hive_metastore_server.html#conc...

avatar
Contributor

i.e. If i receive the alert with Activity Monitor.

Pause Duration Bad
Average time spent paused was 2 minute(s), 54 second(s) (290.37%) per minute over the previous 5 minute(s). Critical threshold: 60.00%.

 

There are various charts like disk latency,Disk throughput,networkthroughput. Garbage collection time. 

How I can understand due to which this problem occurs in system. Will any of the specific charts help me there.

 

 

avatar
Rising Star

This is a garbage collection (GC) pause.

Check how much JVM Heap had been used for the service (HS2 etc..) for which you received this Alert.

 

From the alert, you can see that JVM pause takes 2+min and you have configured to alert if GC pause takes 60% of 1min. You should see the JVM Heap Memory Usage and GC pause charts in the Service(for which you see this alert) and check If the heap is constantly high then that is the likely reason. In that case, the solution could be a simple as increasing the heap size.

 

You can refer to Cloudera documents[1][2]

 

[1] https://www.cloudera.com/documentation/enterprise/5-7-x/topics/cm_ht_hiveserver2.html

[2] https://www.cloudera.com/documentation/enterprise/5-7-x/topics/cm_ht_hive_metastore_server.html