Created on 03-07-2019 08:47 AM - edited 03-07-2019 08:58 AM
After using Cloudera wizard to add host to cluster ... the slave agent logs show the error below
[07/Mar/2019 18:54:41 +0000] 20033 MainThread agent INFO To override these variables, use /etc/cloudera-scm-agent/config.ini. Environment variables for CDH locations are not used when CDH is installed from parcels. [07/Mar/2019 18:54:43 +0000] 20033 MainThread supervisor INFO Trying to connect to supervisor (Attempt 1) [07/Mar/2019 18:54:43 +0000] 20033 MainThread supervisor INFO Supervisor version: 3.0, pid: 18803 [07/Mar/2019 18:54:43 +0000] 20033 MainThread supervisor INFO Successfully connected to supervisor [07/Mar/2019 18:54:43 +0000] 20033 MainThread agent INFO Supervisor version: 3.0, pid: 18803 [07/Mar/2019 18:54:43 +0000] 20033 MainThread agent INFO Connecting to previous supervisor: agent-18803-1551977586. [07/Mar/2019 18:54:45 +0000] 20033 MainThread supervisor INFO Triggering supervisord update. [07/Mar/2019 18:54:45 +0000] 20033 MainThread _cplogging INFO [07/Mar/2019:18:54:45] ENGINE Bus STARTING [07/Mar/2019 18:54:45 +0000] 20033 MainThread _cplogging INFO [07/Mar/2019:18:54:45] ENGINE Started monitor thread '_TimeoutMonitor'. [07/Mar/2019 18:54:45 +0000] 20033 MainThread _cplogging INFO [07/Mar/2019:18:54:45] ENGINE Serving on http://127.0.0.1:9001 [07/Mar/2019 18:54:45 +0000] 20033 MainThread _cplogging INFO [07/Mar/2019:18:54:45] ENGINE Bus STARTED [07/Mar/2019 18:54:45 +0000] 20033 MainThread daemon INFO New monitor: (<cmf.monitor.host.HostMonitor object at 0x7f58b56cdb10>,) [07/Mar/2019 18:54:45 +0000] 20033 MonitorDaemon-Scheduler daemon INFO Monitor ready to report: ('HostMonitor',) [07/Mar/2019 18:54:45 +0000] 20033 MainThread agent INFO Setting default socket timeout to 45 [07/Mar/2019 18:54:45 +0000] 20033 MainThread agent INFO Previously active parcels: {'SPARK2': '2.3.0.cloudera4-1.cdh5.13.3.p0.611179', 'CDH': '5.14.4-1.cdh5.14.4.p0.3'} [07/Mar/2019 18:54:45 +0000] 20033 MainThread agent INFO Loading last saved hb response to complete initialization: /var/lib/cloudera-scm-agent/response.avro [07/Mar/2019 18:54:45 +0000] 20033 Monitor-HostMonitor network_interfaces INFO NIC iface virbr0 doesn't support ETHTOOL (95) [07/Mar/2019 18:54:45 +0000] 20033 MainThread heartbeat_tracker INFO HB stats (seconds): num:1 LIFE_MIN:0.02 min:0.02 mean:0.02 max:0.02 LIFE_MAX:0.02 [07/Mar/2019 18:55:52 +0000] 20033 Monitor-HostMonitor throttling_logger ERROR Timed out waiting for worker process collecting filesystem usage to complete. This may occur if the host has an NFS or other remote filesystem that is not responding to requests in a timely fashion. Current nodev filesystems: /dev/shm,/run,/sys/fs/cgroup,/run/user/1000,/run/cloudera-scm-agent/process,/run/cloudera-scm-agent/process,/run/user/0
and CM wizard displays Error message
Failed to receive heart beat from agent
Created 03-07-2019 10:42 PM
Hello @Exor,
Lets step back a bit. To understand the issue better, I would request you to please help with couple of pre-lims:
Hope this gives more clarity on next steps.
Created on 03-08-2019 04:06 AM - edited 03-08-2019 04:32 AM
Created 07-29-2019 07:33 AM
Hi! We have the same problem! In log details we found error: "Monitor-HostMonitor throttling_logger ERROR Timed out waiting for worker process collecting filesystem usage to complete. This may occur if the host has an NFS or other remote filesystem that is not responding to requests in a timely fashion. Current nodev filesystems: /dev/shm,/run,/sys/fs/cgroup,/run/cloudera-scm-agent/process,/run/cloudera-scm-agent/process,/run/user/1001,/run/user/1003,/run/user/0"
At the moment we tried to install Cloudera manager 6.2.0 and 6.1.1, but result the same. Agent host has no problems with connectivity to Manager host (Checked it by command "telnlet <Cloudera manager machine ip address or name> 7182" which was successfully connected. Also command "ss -anp" showed "established" connection on both hosts.)
Created 07-30-2019 11:14 AM
Hi @Rasgeado ,
When you say that you have the same problem, what is the issue exactly? After the addition of a host fails, if you open up CM and view the new host in the Hosts tab, does it show in bad health? If so, click on that host to view any host health errors. This will give us the first clue.
Next, review the agent log on that host (normally /var/log/cloudera-scm-agent/cloudera-scm-agent.log).
While it is possible that the HostMonitor Error is related, it is not likely since the timeout is 2 seconds. More information about the problem would be good so we can come up with good possible causes.
Created 07-31-2019 04:48 AM
Hi,@bgooley ! Thx for reply. the main trouble is that we cant pass "Install agents" step due to the error:
"Monitor-HostMonitor throttling_logger ERROR Timed out waiting for worker process collecting filesystem usage to complete. This may occur if the host has an NFS or other remote filesystem that is not responding to requests in a timely fashion. Current nodev filesystems: /dev/shm,/run,/sys/fs/cgroup,/run/cloudera-scm-agent/process,/run/cloudera-scm-agent/process,/run/user/1001,/run/user/1003,/run/user/0" - which I found in "Details". And there are no errors anymore, only this one.
If I open up CM and go to the Hosts tab then there are no hostst added, except that one, on which CM is installed.
Which additional information can I provide to solve the issue?
Created 07-31-2019 08:55 AM
Hi @Rasgeado ,
Have you checked the /var/log/cloudera-scm-agent/cloudera-scm-agent.log file on the host you are trying to add. CM executes an scm_prepare_node script on the host, so it sounds as if the steps leading up to the heartbeat detection succeed. The most useful information, then, would be in that log.
You might look for errors or messages regarding the heartbeat.
Try restarting the agent if you don't see any errors pertaining to the heartbeat:
# service cloudera-scm-agent restart
Then review the log for any heartbeat errors or messages.
Created 07-29-2019 11:14 PM
try to manually ssh between the ambari host and new host using the private/public key pair via terminal, in some cases a first time connection needs to be established to add the host to the known hosts official site
Created 07-30-2019 01:21 AM
Hi check if the hostname was resolved correctly , try to disable firewall (if is enable) e launch from ambari/cdh host inspector to identified miss configuration at network level.
BR
Gianluca
Created 07-30-2019 07:51 AM
Hi! Thanks for the answer!
We're using cloudera hadoop about two years and installed it many times, but run into the trouble for the first time.
Hostnames are resolved correctly (we have own dns with forward and reverse zones) and there is no firewall rules on any host at all.
Selinux is disabled. Moreover, i can telnet to CM host by its hostname from agents hosts on port 7182, but still watch this annoying error.
We can't run host inspector, because hosts cant pass the "Install agents" step due to this error. Now we have no idea what the problem is and what to do.