06-16-2019 07:55 PM - last edited on 06-17-2019 05:48 AM by cjervis
CDH VERSION: 5.13.3-1.cdh5.13.3.p0.2
Cloudera Express 5.13.3 (#6 built by jenkins on 20180328-1830 git: f90c58536c252d70a23bde6d94514d92a1f111d4)
Java VM : Java HotSpot(TM) 64-Bit Server VM
Java VM Provider: Oracle Corporation
Java Version: 1.8.0_121
this happened in the process of enabling permissions. I installed sentry and did related configuration. including hadoop LDAP Group mapping configuration, hive integrated sentry and LDAP authentication configuration， impala integrated sentry and LDAP authentication configuration, hue integrated sentry configuration and LDAP authentication configuration. then, i deployed all client configuration, but, the deploying client configuration of spark was always timeout at some nodes. I retried many times and which made cloudera manager very slowly, and i found "APPARENT DEADLOCK" at clouder-cm-server.log file. So I had to restart cloudera manager server. then I ignore the deploying client configuration timeout, and began to restart cluster. we found namenode, datanode, impala, hive roles started and stopped always timeout at these nodes. datanode roles sometimes started failure with log "receive singal 15", it mean some disk related conditions was not met.
I used a temporary method to solve this problem,which is reinstall the node. including the agent, all roles.
I think the problem is on the agent, Maybe the metadata is inconsistent.
I really need to known why this happened. I have goolge for it but find nothing useful. Look forwared for your advises. Thank you very much
06-17-2019 06:07 AM
CM commands (e.g. restart a role on a cluster node) time out after 150 seconds if there is no response sent back during this time. This happens e.g. if the CM agent on the cluster node is not heartbeating into CM server. Please
If the host is not heartbeating in then the restart command will fail. The CM agent logs will hopefully show details, a quick resolution may be to restart the CM agent on that node with command
# service cloudera-scm-agent restart
06-18-2019 08:13 PM
Hi, gzigldrum Thank you for your response.
We use MySQL to save cm server metadata.
From the log, the managed threads were 3 and pending tasks were about 65. I don't know why this happened.
but I get some information from here:
it recommands to increase numHelperThreads ,maxAdministrativeTaskTime and set statementCacheNumDeferredCloseThreads to 1
But I can't find any configuration option for these at cm web gui or config file.
this log shows the command id is null. which indicates there is a problem save the command to mysql and the cm web console can't get valid command response. so, it keep waitting and timeout lastly. is Right?
06-19-2019 01:55 AM
Please set "-Xmx4G" in /etc/default/cloudera-scm-server and restart CM, then test again. The apparent deadlock messages are an indicator of low memory condition and misleading, we need to make sure to cover the most likely root causes first.
06-24-2019 11:17 PM
I found the root cause. the reason is logstash running with full cpu， which made other process running very slowly。