Created on
03-24-2020
10:18 PM
- last edited on
03-24-2020
11:33 PM
by
ask_bill_brooks
I am getting the following error when I am trying to add host to cloudera:
[25/Mar/2020 04:56:45 +0000] 1611 MainThread agent ERROR Failed to handle Heartbeat Response:{....} (A big response)
Traceback (most recent call last):
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/agent.py", line 1526, in handle_heartbeat_response
self._handle_heartbeat_response(response)
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/agent.py", line 1661, in _handle_heartbeat_response
self._update_parcel_activation_state(response)
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/agent.py", line 1572, in _update_parcel_activation_state
manage_old_parcels = old_response.get("create_parcel_symlinks")
AttributeError: 'NoneType' object has no attribute 'get'
This causes the following error:
Traceback (most recent call last):
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/parcel.py", line 125, in refresh
pid = ParcelId.dir(child)
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/parcel_id.py", line 75, in dir
raise Exception("Invalid parcel directory: %s" % (dir))
Exception: Invalid parcel directory: CDH
I have checked the /etc/hosts file, which seems fine and consistent. Any other way to debug this.
Thanks,
Created 04-21-2020 01:19 AM
@ankesh_clo Can you check /var/lib/cloudera-scm-agent/ dir on the new host, Then delete the file response.avro if exists.
After that go to CM > Hosts > All hosts > Click on the newly added host and match the HOST ID with /var/lib/cloudera-scm-agent/uuid file if this is not in sync then please modify the uuid file as per HOST ID and restart the agent.
Created on 04-26-2020 07:38 PM - edited 04-26-2020 08:21 PM
@GangWar Thanks for the reply.
But this didn't solve the issue. There is no parcel downloaded and no progress. The same error appears on the agent as posted above.
For restarting the agent, I use sudo service cloudera-scm-agent restart.
Please let me know where I am going wrong.
Thanks!
Created on 04-27-2020 01:26 AM - edited 04-27-2020 01:37 AM
@ankesh_clo This seems an issue with Symlink as well looking at the stack trace. Try
the below method once.
1. Stop the Agent.
2. Remove the broken symlinks from /etc/alternatives
and corresponding conf files from /var/lib/alternatives.
3. Remove the
parcels/files from:
/opt/cloudera/parcels
/opt/cloudera/parcels/.flood
/opt/cloudera/parcel-cache4.
4. Start the agent. The agent will
redistribute the parcel and fix the Alternatives.
Created 04-27-2020 01:41 AM
@GangWar I do not see broken links in /etc/alternatives and there is no folder named /var/lib/alternatives.
Also, there are no parcel files downloaded on the new host. The parcel and parcel-cache folders are empty.
Should I add a more detailed log?
Thanks,
Created 04-27-2020 10:46 AM
Created 04-27-2020 08:01 PM
@GangWar Thanks for the help. I am attaching the link to log file from server and agent host.
Agent: https://ideone.com/TNKtRZ
Server: https://ideone.com/RHt1Yl
Ps: Sorry there was an issue uploading the log file. I have uploaded the relevant portion.
Please visit the link and download the log file (download at top of editor, below url box)
Created 04-29-2020 04:53 AM
@ankesh_clo Looking at the logs the issue is also seems because of wrong parcel URL.
2020-04-29 01:28:33,070 ERROR ParcelUpdateService:com.cloudera.parcel.components.ParcelDownloaderImpl: (1 skipped) Failed to download manifest. Status code: 404 URI: https://archive.cloudera.com/accumulo-c6/parcels/latest/manifest.json/
The correct link is https://archive.cloudera.com/accumulo/parcels/latest/
Either correct this link in Parcel configuration or just remove this Accumulo parcel link and try again.
Created on 05-03-2020 08:19 PM - edited 05-03-2020 10:32 PM
@GangWar I removed the link from configuration as I am not using accumulo. The download error is gone but the errors still persists.
Created 05-04-2020 10:35 AM
Hi @ankesh_clo,
Could you please share outputs from below three commands on the host? This will help us to find out if there are any permissions issues etc.
ls -altr /var/lib/cloudera-scm-agent
and
ls -altr /opt/cloudera
and
ls -altr /opt/cloudera/parcels
Thanks,
Li
Li Wang, Technical Solution Manager
Created 05-05-2020 08:12 PM
Hi @lwang
Thanks for responding.
Ran this commands on the new host I am trying to add. Here are the outputs in order:
ls -altr /var/lib/cloudera-scm-agent
total 36
drwxr-xr-x 46 root root 4096 Apr 21 22:41 ..
-rw-r--r-- 1 root root 36 Apr 21 22:41 uuid
-rw-r--r-- 1 root root 36 Apr 21 22:41 cm_guid
-rw------- 1 root root 14575 May 5 02:19 response.avro
drwxr-xr-x 2 cloudera-scm cloudera-scm 4096 May 5 02:19 .
-rw------- 1 root root 2 May 6 03:09 active_parcels.json
ls -altr /opt/cloudera
total 40
drwxr-xr-x 3 root root 4096 Apr 21 22:40 ..
drwxr-xr-x 2 root root 4096 Apr 21 22:41 parcel-cache
drwxr-xr-x 6 cloudera-scm cloudera-scm 4096 Apr 21 22:41 .
drwxr-xr-x 10 root root 4096 Apr 27 02:59 cm-agent
drwxr-xr-x 27 root root 20480 Apr 27 02:59 cm
drwxr-xr-x 3 root root 4096 May 4 02:03 parcels
ls -altr /opt/cloudera/parcels
drwxr-xr-x 6 cloudera-scm cloudera-scm 4096 Apr 21 22:41 ..
drwxr-xr-x 2 cloudera-scm cloudera-scm 4096 May 4 02:03 .flood
drwxr-xr-x 3 root root 4096 May 4 02:03 .
Please have a look. Thanks!
Created 05-09-2020 06:59 AM
@ankesh_clo Can you delete below files and try to restart the agent.
-rw------- 1 root root 14575 May 5 02:19 response.avro
-rw------- 1 root root 2 May 6 03:09 active_parcels.json
Created on 05-10-2020 10:42 PM - edited 05-10-2020 10:43 PM
@GangWar I did what you asked and i get this on the agent:
[11/May/2020 05:22:34 +0000] 6495 MainThread agent INFO To override these variables, use /etc/cloudera-scm-agent/config.ini. Environment variables for CDH locations are not used when CDH is installed from parcels.
[11/May/2020 05:22:36 +0000] 6495 MainThread supervisor INFO Trying to connect to supervisor (Attempt 1)
[11/May/2020 05:22:36 +0000] 6495 MainThread supervisor INFO Supervisor version: 3.0, pid: 1614
[11/May/2020 05:22:36 +0000] 6495 MainThread supervisor INFO Successfully connected to supervisor
[11/May/2020 05:22:36 +0000] 6495 MainThread agent INFO Supervisor version: 3.0, pid: 1614
[11/May/2020 05:22:36 +0000] 6495 MainThread agent INFO Connecting to previous supervisor: agent-1614-1589173066.
[11/May/2020 05:22:38 +0000] 6495 MainThread supervisor INFO Triggering supervisord update.
[11/May/2020 05:22:38 +0000] 6495 MainThread _cplogging INFO [11/May/2020:05:22:38] ENGINE Bus STARTING
[11/May/2020 05:22:38 +0000] 6495 MainThread _cplogging INFO [11/May/2020:05:22:38] ENGINE Started monitor thread '_TimeoutMonitor'.
[11/May/2020 05:22:38 +0000] 6495 MainThread _cplogging INFO [11/May/2020:05:22:38] ENGINE Serving on http://127.0.0.1:9001
[11/May/2020 05:22:38 +0000] 6495 MainThread _cplogging INFO [11/May/2020:05:22:38] ENGINE Bus STARTED
[11/May/2020 05:22:40 +0000] 6495 MainThread daemon INFO New monitor: (<cmf.monitor.host.HostMonitor object at 0x7f46e2d12ed0>,)
[11/May/2020 05:22:40 +0000] 6495 MonitorDaemon-Scheduler daemon INFO Monitor ready to report: ('HostMonitor',)
[11/May/2020 05:22:40 +0000] 6495 MainThread agent INFO Setting default socket timeout to 45
[11/May/2020 05:22:40 +0000] 6495 MainThread agent INFO Failed to read available parcel file: [Errno 2] No such file or directory: '/var/lib/cloudera-scm-agent/active_parcels.json'
[11/May/2020 05:22:40 +0000] 6495 MainThread agent INFO Loading last saved hb response to complete initialization: /var/lib/cloudera-scm-agent/response.avro
[11/May/2020 05:22:40 +0000] 6495 Monitor-HostMonitor network_interfaces INFO NIC iface ens5 doesn't support ETHTOOL (95)
[11/May/2020 05:22:40 +0000] 6495 MainThread heartbeat_tracker INFO HB stats (seconds): num:1 LIFE_MIN:0.02 min:0.02 mean:0.02 max:0.02 LIFE_MAX:0.02
[11/May/2020 05:22:40 +0000] 6495 MainThread agent INFO CM server guid: 513d3669-b5a8-49c0-863a-c0396dff5c7b
[11/May/2020 05:22:40 +0000] 6495 MainThread agent INFO Using parcels directory from server provided value: /opt/cloudera/parcels
[11/May/2020 05:22:40 +0000] 6495 MainThread parcel INFO Agent does create users/groups and apply file permissions
[11/May/2020 05:22:40 +0000] 6495 MainThread downloader INFO Downloader path: /opt/cloudera/parcel-cache
[11/May/2020 05:22:40 +0000] 6495 MainThread parcel_cache INFO Using /opt/cloudera/parcel-cache for parcel cache
[11/May/2020 05:22:40 +0000] 6495 MainThread throttling_logger WARNING Failed parsing alternatives line: rename string index out of range link best version is /usr/bin/file-rename
[11/May/2020 05:22:40 +0000] 6495 MainThread agent INFO Flood daemon (re)start attempt
[11/May/2020 05:22:42 +0000] 6495 MainThread firehoses INFO Reporting interval updated: 5.0 -> 60
[11/May/2020 05:22:42 +0000] 6495 MainThread agent ERROR Failed to handle Heartbeat Response: {u'firehoses': [{u'rol [big response....]
-----------------------
Traceback (most recent call last):
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/agent.py", line 1528, in handle_heartbeat_response
self._handle_heartbeat_response(response)
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/agent.py", line 1661, in _handle_heartbeat_response
self._update_parcel_activation_state(response)
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/agent.py", line 1572, in _update_parcel_activation_state
manage_old_parcels = old_response.get("create_parcel_symlinks")
AttributeError: 'NoneType' object has no attribute 'get'