Created 12-09-2017 12:55 AM
hi
am attempting to ansibilize the creation of a hdp cluster and have followed the documentation to setup ambari-server and ambari-agent to run as non-root users say ambari specifically
A soon as the ambari-agent detects the ambari-server it disconnects with the following stacktrace. It seems like a permission issue but cannot figure out what it is. FWIW if ambari-agent is run as root it works fine.
INFO 2017-12-08 18:33:38,479 PingPortListener.py:50 - Ping port listener started on port: 8670 INFO 2017-12-08 18:33:38,480 main.py:437 - Connecting to Ambari server at https://ambari-server.mydomain.com:8440 (192.168.20.20) INFO 2017-12-08 18:33:38,480 NetUtil.py:70 - Connecting to https://ambari-server.mydomain.com:8440/ca INFO 2017-12-08 18:33:38,526 main.py:447 - Connected to Ambari server ambari-server.mydomain.com INFO 2017-12-08 18:33:38,526 hostname.py:67 - agent:hostname_script configuration not defined thus read hostname 'ambari-server.mydomain.com' using socket.getfqdn().INFO 2017-12-08 18:33:38,527 threadpool.py:58 - Started thread pool with 3 core threads and 20 maximum threads WARNING 2017-12-08 18:33:38,527 AlertSchedulerHandler.py:280 - [AlertScheduler] /var/lib/ambari-agent/cache/alerts/definitions.json not found or invalid. No alerts will be scheduled until registration occurs. INFO 2017-12-08 18:33:38,527 AlertSchedulerHandler.py:175 - [AlertScheduler] Starting <ambari_agent.apscheduler.scheduler.Scheduler object at 0x10e93d0>; currently running: False ERROR 2017-12-08 18:33:38,528 Controller.py:506 - Controller thread failed with exception: Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/ambari_agent/Controller.py", line 486, in run self.actionQueue = ActionQueue(self.config, controller=self) File "/usr/lib/python2.6/site-packages/ambari_agent/ActionQueue.py", line 79, in __init__ self.statusCommandResultQueue = multiprocessing.Queue() # this queue is filled by StatuCommandsExecutor. File "/usr/lib64/python2.6/multiprocessing/__init__.py", line 213, in Queue return Queue(maxsize) File "/usr/lib64/python2.6/multiprocessing/queues.py", line 37, in __init__ self._rlock = Lock() File "/usr/lib64/python2.6/multiprocessing/synchronize.py", line 117, in __init__ SemLock.__init__(self, SEMAPHORE, 1, 1) File "/usr/lib64/python2.6/multiprocessing/synchronize.py", line 49, in __init__ sl = self._semlock = _multiprocessing.SemLock(kind, value, maxvalue) OSError: [Errno 13] Permission denied ERROR 2017-12-08 18:33:40,530 main.py:477 - Exiting with exception: Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/ambari_agent/main.py", line 472, in <module> main(heartbeat_stop_callback) File "/usr/lib/python2.6/site-packages/ambari_agent/main.py", line 451, in main run_threads(server_hostname, heartbeat_stop_callback) File "/usr/lib/python2.6/site-packages/ambari_agent/main.py", line 341, in run_threads controller.get_status_commands_executor().kill("AGENT_STOPPED", can_relaunch=False) AttributeError: 'NoneType' object has no attribute 'kill' INFO 2017-12-08 18:33:40,532 ExitHelper.py:56 - Performing cleanup before exiting... INFO 2017-12-08 18:33:40,532 threadpool.py:120 - Shutting down thread pool INFO 2017-12-08 18:33:40,532 scheduler.py:606 - Scheduler has been shut down INFO 2017-12-08 18:33:40,533 threadpool.py:58 - Started thread pool with 3 core threads and 20 maximum threads INFO 2017-12-08 18:33:40,533 AlertSchedulerHandler.py:185 - [AlertScheduler] Stopped the alert scheduler. INFO 2017-12-08 18:33:40,533 threadpool.py:120 - Shutting down thread pool INFO 2017-12-08 18:33:40,550 Controller.py:151 - Server connection disconnected.
As mentioned all the sudo permissions listed here are present in the /etc/sudoers.d/ambari file
Should probably add that the above was when running in lxc containers using vagrant-lxc and seems like to be related to permissions required by the python scripts to shared memory (/dev/shm).
However on vmware this worked fine.
Created 12-09-2017 07:48 AM
It looks like a duplicate thread of : https://community.hortonworks.com/questions/152000/ambar-agent-non-root-disconnectes-from-ambari-ser...
Please close one of them
.
File "/usr/lib/python2.6/site-packages/ambari_agent/main.py", line 341, in run_threads controller.get_status_commands_executor().kill("AGENT_STOPPED", can_relaunch=False) AttributeError: 'NoneType' object has no attribute 'kill'
Based on the above error it looks like your ambari-agent might not be installed properly or it might be having some missing / old version of scripts.
.
So please check the ambari-agent binary version on the problematic host to find out if the agent verison is correct?
If needed then please reinstall the ambari agent.
# rpm -qa | grep ambari-agent # yum clean all # yum reinstall ambari-agent -y
.