Member since
10-29-2015
116
Posts
26
Kudos Received
4
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
143 | 06-27-2024 02:42 AM | |
1734 | 06-24-2022 09:06 AM | |
2962 | 01-19-2021 06:56 AM | |
52892 | 01-18-2016 06:59 PM |
03-07-2019
03:09 AM
Thank you for the reply Harsh J. Would you be able to please help me with any quick command / script to identify avoidable open files or files stuck in some process using 'lsof' and guide further actions to take? I tried running a generic 'lsof | grep java' but it obviously gave me a huge list of files and became a bit difficult to get relevant information. Thanks snm1523
... View more
03-06-2019
09:13 AM
Hello All, I am looking for some best practices or recommendations to set a best possible value for rlimit_fds (Maximum Process File Descriptors) property. Currently, it is set to default i.e. 32768 and we are getting File Descriptor Threshold alerts. We would first like to look for a best possible value for rlimit_fds. Is there a formulae or a practice or few checks that can be performed to determine a best value? Thanks snm1523
... View more
Labels:
- Labels:
-
Apache HBase
-
Cloudera Manager
02-25-2019
09:56 AM
Hello gzigldrum, Thank you for the guidance. This certainly helps. I will go through both the KB articles you shared and also review the Navigator Server and CM Agent logs for audit health warnings. Will comeback with the findings. In the meantime, any luck with HIVEMETASTORE_CANARY_HEALTH and NODE_MANAGER_WEB_METRIC_COLLECTION? Thanks snm1523
... View more
02-22-2019
08:17 AM
Hello Gzigldrum, Thank you for the reply. Below are the exact health messages from CM for each alert: HIVEMETASTORE_CANARY_HEALTH: The health test result for HIVEMETASTORE_CANARY_HEALTH has become bad: The Hive Metastore canary failed to create a database. OR The health test result for HIVEMETASTORE_CANARY_HEALTH has become bad: The Hive Metastore canary failed to create a partition in the table it created. OR The health test result for HIVEMETASTORE_CANARY_HEALTH has become bad: The Hive Metastore canary failed to drop the table it created. REGION_SERVER_AUDIT_HEALTH: The health test result for REGION_SERVER_AUDIT_HEALTH has become bad: There is a problem processing audits for REGIONSERVER. IMPALAD_QUERY_MONITORING_STATUS: he health test result for IMPALAD_QUERY_MONITORING_STATUS has become bad: There are 1 error(s) seen monitoring executing queries, and 0 errors(s) seen monitoring completed queries for this role in the previous 5 minute(s). Critical threshold: any. HIVESERVER2_SCM_HEALTH: The health test result for HIVESERVER2_SCM_HEALTH has become bad: This role s process is starting. This role is supposed to be started. NAME_NODE_AUDIT_HEALTH: The health test result for NAME_NODE_AUDIT_HEALTH has become bad: There is a problem processing audits for NAMENODE. NODE_MANAGER_WEB_METRIC_COLLECTION: The health test result for NODE_MANAGER_WEB_METRIC_COLLECTION has become bad: The Cloudera Manager Agent is not able to communicate with this role s web server. These health alerts occur in every 2-3 days in a time frame of 2-3 hours creating around 5-10 tickets during each interval. We have also checked from network side to verify if there was a network outage or a glitch in those windows, however, no luck. We have tried to diagnose through logs of each alert, but haven't found anything interesting. Hence, looking for some more guidance on what additional could be checked to identify a root cause of these. Thanks snm1523
... View more
02-21-2019
07:11 AM
Thank you for the quick replies, Gzigldrum. I have created a separate post explaining the alerts we get. Below is the link to the same. Flooded with failed health test alerts For every alert we get an incident in BMC as HP OVO is configured to generate an incident and assign it to us. Every morning we start with close to around 15-20 such incidents followed by 15-20 in rest of the day. Would be great if you could please post some suggestions to troubleshoot those alerts and help in permanently fixing them. Note: I will accept your previous reply as solution to this post. Thanks snm1523
... View more
02-21-2019
06:41 AM
Hello Gzigldrum, Thank you for the reply. However, the instructions provided in the KB article and values suggested by you are already applied. Please advise. Thanks Snm1523
... View more
02-21-2019
06:14 AM
Hello All, Need some suggestions on the exact reason of below heath tests getting failed multiple times, almost 3 times every day and generates atleast 3-4 alerts each time. 1. HIVEMETASTORE_CANARY_HEALTH 2. REGION_SERVER_AUDIT_HEALTH 3. IMPALAD_QUERY_MONITORING_STATUS 4. HIVESERVER2_SCM_HEALTH 5. NAME_NODE_AUDIT_HEALTH 6. NODE_MANAGER_WEB_METRIC_COLLECTION Any help or suggestion to permanently fix these alerts would of great help. Also, if anyone could also guide to reach root cause of this would also be helpful Thanks snm1523
... View more
Labels:
02-21-2019
03:54 AM
Hello All, We have a HP OVO monitoring tool monitoring all the alerts from Cloudera Manager and raising a BMC Remedy Incident accordingly. However, we are at times flooded with these monitoring alert tickets and when it is immediately checked in the cluster, everything looks green. When we dig in for detailed analysis and check logs of the respective alert, we do not have anything major and also the service is green. It looks like it heals by itself. However, this flood of tickets, raises concerns with Management and questions for a root cause, which we do not have in reality and also unable to find anything as it was auto fixed with no special traces behind. I am looking out for recommendations / best practices on how exactly this should be setup, so we get only the actual / required alerts. Is there any configuration we need to do in CM or there is something that can be configured in HP OVO. Any suggestions would be of a great help. Also, if anyone has a suggestion that should be checked while troubleshooting these alerts would also be welcomed. Thanks snm1523
... View more
Labels:
- Labels:
-
Cloudera Manager
02-15-2019
02:38 AM
Hello, I would like to know the various activities performed by respective users on Cloudera Manager. For example, which user restarts a service or moves a cluster / service to maintenance mode. Would be great if any one could share some information on this. Thanks snm1523
... View more
Labels:
- Labels:
-
Cloudera Manager