Created on 08-20-2018 08:39 PM - last edited on 09-19-2022 11:52 AM by cjervis
Created 08-21-2018 03:13 PM
Hi @yadir Aguilar,
your last comment was informatory.
can you please try adding the following option to security section in "/etc/amabri-agent/conf/ambari-agent.ini" and restart ambari-agent
[security] force_https_protocol=PROTOCOL_TLSv1_2
i feel you are hitting this bug : https://issues.apache.org/jira/browse/AMBARI-17666
reference : https://community.hortonworks.com/content/supportkb/208283/error-2018-07-16-005228887-netutilpy96-eo...
Hope this helps you, please mark aswer as accept if it did 🙂
Created 08-20-2018 09:31 PM
Please check ambari-agent logs to find out why Agent is not sending the heat beat? logs would be at /var/log/ambari-agent/ambari-agent.log
Created 08-20-2018 10:07 PM
this is the answer from ambari-agent.log:
INFORMACIÓN 2018-08-20 16: 25: 11,790 main.py:147 - loglevel = logging.INFO INFO 2018-08-20 16: 25: 11,790 main.py:147 - loglevel = logging.INFO INFO 2018-08-20 16: 25: 11.790 main.py:147 - loglevel = logging.INFO INFO 2018-08-20 16: 25: 11.792 DataCleaner.py:39 - Se inició el hilo de limpieza de datos INFO 2018-08-20 16: 25: 11.793 DataCleaner. py: 120 - Se inició la limpieza de datos INFO 2018-08-20 16: 25: 11.794 DataCleaner.py:122 - Limpieza de datos finalizada INFO 2018-08-20 16: 25: 11.794 hostname.py:67 - agent: hostname_script configuration no defined por lo tanto, lea el nombre de host 'esclavo.hdp.com' usando socket.getfqdn (). INFORMACIÓN 2018-08-20 16: 25: 11,799 PingPortListener.py:50 - Escucha de puerto de ping iniciada en el puerto: 8670 INFO 2018-08-20 16: 25: 11,802 main.py:439 - Conexión al servidor Ambari en https: / /maestro.hdp.com:8440 (10.137.44.53) INFORMACIÓN 2018-08-20 16: 25: 11,802 NetUtil.py:70 - Conexión a https: //maestro.hdp.com:8440/ca INFO 2018-08-20 16: 25: 11,874 main.py:449 - Conectado al servidor de Ambari maestro.hdp.com INFO 2018-08-20 16: 25: 11,875 threadpool. py: 58 - Grupo de subprocesos iniciado con 3 subprocesos principales y 20 subprocesos máximos ADVERTENCIA 2018-08-20 16: 25: 11,876 AlertSchedulerHandler.py:280 - [AlertScheduler] / var / lib / ambari-agent / cache / alerts / definitions. json no encontrado o inválido. No se programarán alertas hasta que se realice el registro. INFORMACIÓN 2018-08-20 16: 25: 11,876 AlertSchedulerHandler.py:175 - [AlertScheduler] Iniciando el objeto <ambari_agent.apscheduler.scheduler.Scheduler en 0x7fa4efe427d0>; actualmente en ejecución: False INFO 2018-08-20 16: 25: 13,926 hostname.py:106 - Leer el nombre de host público 'esclavo.hdp.com' usando socket.getfqdn () INFO 2018-08-20 16: 25: 13,930 Hardware. py: 68 - Inicializando la información del sistema host. INFORMACIÓN 2018-08-20 16:25:14, 015 Hardware.py:188 - Se ignoraron algunos puntos de montaje: / dev / shm, / run, / sys / fs / cgroup, / run / user / 0 INFO 2018-08-20 16: 25: 14,074 hostname.py:67 - agent: configuración de hostname_script no definida, por lo tanto, lea el nombre de host 'esclavo.hdp.com' usando socket.getfqdn (). INFO 2018-08-20 16: 25: 14,079 Facter.py:202 - Directorio: '/ etc / resource_overrides' no existe - no se usará para reunir recursos del sistema.
yo dont'n sure what is the probelm
Created 08-21-2018 06:30 AM
Hi @yadir Aguilar,
From the logs
Connection to https: //maestro.hdp.com:8440/ca INFO 2018 -08-20 16: 25: 11,874 main.py:449 - Connected to the Ambari server maestro.hdp.com
It looks your ambari-agent is trying to connect to maestro.hdp.com and its connected successfully too.
can you try to restart ambari-agent once and see if that helps
ambari-agent restart
don't see any specific error in ambari-agent logs commented in here. look out for ERROR in the log. what you have attached is all warnings and try to attach in code format
i am code format
Hope this helps you.
Created 08-21-2018 01:20 PM
i found the error:
WARNING 2018-08-20 16:32:40,475 base_alert.py:138 - [Alert][smartsense_gateway_status] Unable to execute alert. [Alert][smartsense_gateway_status] Unable to extract JSON from JMX response ERROR 2018-08-20 16:32:40,481 script_alert.py:123 - [Alert][yarn_nodemanager_health] Failed with result CRITICAL: ['Connection failed to http://esclavo.hdp.com:8042/ws/v1/node/info (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 171, in execute\n url_response = urllib2.urlopen(query, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: <urlopen error [Errno 111] Connection refused>\n)'] ERROR 2018-08-20 16:32:40,481 script_alert.py:123 - [Alert][yarn_nodemanager_health] Failed with result CRITICAL: ['Connection failed to http://esclavo.hdp.com:8042/ws/v1/node/info (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 171, in execute\n url_response = urllib2.urlopen(query, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: <urlopen error [Errno 111] Connection refused>\n)']
but i don't know how sresolve it
Created 08-21-2018 01:50 PM
Hi @yadir Aguilar ,
It looks like some smart sense is not responding.
can you perform the following steps and see if it helps
1) execute :
ambari-agent restart
2) see whats output of this command :
/usr/sbin/hst agent-status
3)if output of command-2 hangs , try restarting hst-server from ambari-ui and see if the hearbeat come's back.
Hope this troubleshooting helps you
Created 08-21-2018 02:11 PM
when i enter : /usr/sbin/hst agent-status get "registered", what is the commando to restart the hst server
Created 08-21-2018 02:39 PM
Hi @yadir Aguilar,
Looks like your hst-agent is ok.
What are you seeing when you do this command .
[root@asnaikh ~]# cat /var/log/ambari-agent/ambari-agent.log |grep -i heartbeat
INFO 2018-08-21 14:36:13,697 Controller.py:311 - Building heartbeat message INFO 2018-08-21 14:36:13,699 Heartbeat.py:87 - Adding host info/state to heartbeat message. INFO 2018-08-21 14:36:14,001 Controller.py:320 - Sending Heartbeat (id = 2841) INFO 2018-08-21 14:36:14,005 Controller.py:333 - Heartbeat response received (id = 2842) INFO 2018-08-21 14:36:14,005 Controller.py:342 - Heartbeat interval is 1 seconds INFO 2018-08-21 14:36:14,005 Controller.py:380 - Updating configurations from heartbeat INFO 2018-08-21 14:36:14,006 Controller.py:475 - Waiting 0.9 for next heartbeat
just to figure out if its ambari-agent issue or ambari-server issue.
Can you try to restart ambari-server
ambari-server restart
and see if it helps.
also grep for
[root@anaikhdf1 ~]# cat /var/log/ambari-server/ambari-server.log |grep -i heartbeat|grep -i <problem_nodeFQDN>
Created 08-21-2018 03:03 PM
when enter: [root @ asnaikh ~] # cat /var/log/ambari-agent/ambari-agent.log | grep -i latido del corazón ->
INFO 2018-08-21 08:58:29,074 HeartbeatHandlers.py:84 - Ambari-agent received 15 signal, stopping... INFO 2018-08-21 08:58:29,890 HeartbeatHandlers.py:116 - Stop event received INFO 2018-08-21 08:58:29,890 Controller.py:503 - Finished heartbeating and regis tering cycle INFO 2018-08-21 09:44:03,002 HeartbeatHandlers.py:84 - Ambari-agent received 15 signal, stopping... INFO 2018-08-21 09:44:12,704 HeartbeatHandlers.py:116 - Stop event received INFO 2018-08-21 09:53:12,898 HeartbeatHandlers.py:84 - Ambari-agent received 15 signal, stopping... INFO 2018-08-21 09:53:15,992 HeartbeatHandlers.py:116 - Stop event received INFO 2018-08-21 10:42:50,625 HeartbeatHandlers.py:84 - Ambari-agent received 15 signal, stopping... INFO 2018-08-21 10:42:52,086 HeartbeatHandlers.py:116 - Stop event received INFO 2018-08-21 10:53:41,375 HeartbeatHandlers.py:84 - Ambari-agent received 15 signal, stopping... INFO 2018-08-21 10:53:44,646 HeartbeatHandlers.py:116 - Stop event received
Created 08-21-2018 02:53 PM
i just found this:
[root@maestro ~]# systemctl status ambari-server
● ambari-server.service - LSB: ambari-server daemon
Loaded: loaded (/etc/rc.d/init.d/ambari-server; bad; vendor preset: disabled)
Active: failed (Result: exit-code) since Tue 2018-08-21 10:48:01 EDT; 1min 22s ago Docs: man:systemd-sysv-generator(8) Process: 24652 ExecStart=/etc/rc.d/init.d/ambari-server start (code=exited, status=1/FAILURE)
Aug 21 10:48:01 maestro.hdp.com systemd[1]: Starting LSB: ambari-server daem.... Aug 21 10:48:01 maestro.hdp.com ambari-server[24652]: Using python /usr/bin/... Aug 21 10:48:01 maestro.hdp.com ambari-server[24652]: Starting ambari-server Aug 21 10:48:01 maestro.hdp.com ambari-server[24652]: ERROR: Exiting with exi... Aug 21 10:48:01 maestro.hdp.com ambari-server[24652]: REASON: Ambari Server i... Aug 21 10:48:01 maestro.hdp.com systemd[1]: ambari-server.service: control p...1 Aug 21 10:48:01 maestro.hdp.com systemd[1]: Failed to start LSB: ambari-serv.... Aug 21 10:48:01 maestro.hdp.com systemd[1]: Unit ambari-server.service enter.... Aug 21 10:48:01 maestro.hdp.com systemd[1]: ambari-server.service failed. Hint: Some lines were ellipsized, use -l to show in full.
the same with the ambari-agent