Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Some nodes deploy client configuration timeout. then, start namenode,impala, hive roles timeout too.

Some nodes deploy client configuration timeout. then, start namenode,impala, hive roles timeout too.

New Contributor

CDH VERSION: 5.13.3-1.cdh5.13.3.p0.2

Cloudera Express 5.13.3 (#6 built by jenkins on 20180328-1830 git: f90c58536c252d70a23bde6d94514d92a1f111d4)

Java VM : Java HotSpot(TM) 64-Bit Server VM

Java VM Provider: Oracle Corporation

Java Version: 1.8.0_121

 

this happened in the process of enabling permissions. I installed sentry and did related configuration. including hadoop LDAP Group mapping configuration, hive integrated sentry and LDAP authentication configuration, impala integrated sentry and LDAP authentication configuration, hue integrated sentry configuration and LDAP authentication configuration. then, i deployed all client configuration, but, the deploying client configuration of spark was always timeout at some nodes. I retried many times and which made cloudera manager very slowly, and i found "APPARENT DEADLOCK"  at clouder-cm-server.log file. So I had to restart cloudera manager server. then I ignore the deploying client configuration timeout, and began to restart cluster. we found namenode, datanode, impala, hive roles started and stopped always timeout at these nodes.  datanode roles sometimes started failure with log "receive singal 15", it mean some disk related conditions was not met.

 

I used a temporary method  to solve this problem,which is reinstall the node. including the agent, all roles.

I think the problem is on the agent, Maybe the metadata is inconsistent.

 

I really need to known why this happened. I have goolge for it but find nothing useful. Look forwared for your advises. Thank you very much

5 REPLIES 5

Re: Some nodes deploy client configuration timeout. then, start namenode,impala, hive roles timeout

Expert Contributor

CM commands (e.g. restart a role on a cluster node) time out after 150 seconds if there is no response sent back during this time. This happens e.g. if the CM agent on the cluster node is not heartbeating into CM server. Please

  • increase heap memory for CM server in /etc/default/cloudera-scm-server (to avoid the apparent deadlock issue which indicates out of memory condition)
  • make sure that all hosts are in good health state and heartbeating in in CM -> Hosts -> All Hosts page (the "Last Heartbeat" value needs to be lower than 15 seconds)

If the host is not heartbeating in then the restart command will fail. The CM agent logs will hopefully show details, a quick resolution may be to restart the CM agent on that node with command 

# service cloudera-scm-agent restart

Re: Some nodes deploy client configuration timeout. then, start namenode,impala, hive roles timeout

New Contributor

Hi, gzigldrum Thank you for your response.

 

We use MySQL to save cm server metadata.

 

  • I checked and found all hosts are in good health state
  • I have cheked heap memory for cm and there is no OOM dump file. CMF_JAVA_OPTS="-Xmx2G -XX:MaxPermSize=256m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp"

 

From the log, the managed threads were 3 and pending tasks were about 65.  I don't know why this happened.

but I get some information from here:

https://www.mchange.com/projects/c3p0/#other_ds_configuration

https://www.mchange.com/projects/c3p0/#configuring_statement_pooling

it recommands to increase numHelperThreads ,maxAdministrativeTaskTime and set statementCacheNumDeferredCloseThreads to 1

 

But I can't find any configuration option for these at cm web gui or config file.

深度截图_选择区域_20190617182045.png

 

this log shows the command id is null. which indicates there is a problem save the command to mysql and the cm web console can't get valid command response. so, it keep waitting and timeout lastly. is Right?

深度截图_选择区域_20190618135955.png

Re: Some nodes deploy client configuration timeout. then, start namenode,impala, hive roles timeout

Expert Contributor

Please set "-Xmx4G" in /etc/default/cloudera-scm-server and restart CM, then test again. The apparent deadlock messages are an indicator of low memory condition and misleading, we need to make sure to cover the most likely root causes first.

 

Re: Some nodes deploy client configuration timeout. then, start namenode,impala, hive roles timeout

New Contributor

Hi, gzigldrum

 

I found the root cause. the reason is logstash running with full cpu, which made other process running very slowly。

Re: Some nodes deploy client configuration timeout. then, start namenode,impala, hive roles timeout

Expert Contributor

Thanks for reporting back the root cause you've found!