Support Questions

Find answers, ask questions, and share your expertise

ambari-agent non-root installation fails to connect to ambari-server (non-root)

Expert Contributor

hi

am attempting to ansibilize the creation of a hdp cluster and have followed the documentation to setup ambari-server and ambari-agent to run as non-root users say ambari specifically

  1. Appropriate sudo permissions/commands as defined in the documentation were added to /etc/sudoers.d/ambari
  2. ambari-agent was installed and setup.
  3. change the configuration for hostname to the fqdn host and run_as_user=ambari user
  4. start the ambari-agent and it starts up successfully as the ambari user
  5. ambari-server setup was run as a script which took in the --jdbc and --database options for an external db
  6. the owner permissions on the /var/lib/ambari-server ; /var/log/ambari-server ; /etc/ambari-server to the non-root user ambari
  7. add the ambari-server.user=ambari in the /etc/ambari-server/conf/ambari.properties
  8. start ambari-server
  9. this makes ambari-server run as the ambari user

A soon as the ambari-agent detects the ambari-server it disconnects with the following stacktrace. It seems like a permission issue but cannot figure out what it is. FWIW if ambari-agent is run as root it works fine.

INFO 2017-12-08 18:33:38,479 PingPortListener.py:50 - Ping port listener started on port: 8670
INFO 2017-12-08 18:33:38,480 main.py:437 - Connecting to Ambari server at https://ambari-server.mydomain.com:8440 (192.168.20.20)
INFO 2017-12-08 18:33:38,480 NetUtil.py:70 - Connecting to https://ambari-server.mydomain.com:8440/ca
INFO 2017-12-08 18:33:38,526 main.py:447 - Connected to Ambari server ambari-server.mydomain.com
INFO 2017-12-08 18:33:38,526 hostname.py:67 - agent:hostname_script configuration not defined thus read hostname 'ambari-server.mydomain.com' using socket.getfqdn().INFO 2017-12-08 18:33:38,527 threadpool.py:58 - Started thread pool with 3 core threads and 20 maximum threads
WARNING 2017-12-08 18:33:38,527 AlertSchedulerHandler.py:280 - [AlertScheduler] /var/lib/ambari-agent/cache/alerts/definitions.json not found or invalid. No alerts will be scheduled until registration
 occurs.
INFO 2017-12-08 18:33:38,527 AlertSchedulerHandler.py:175 - [AlertScheduler] Starting <ambari_agent.apscheduler.scheduler.Scheduler object at 0x10e93d0>; currently running: False
ERROR 2017-12-08 18:33:38,528 Controller.py:506 - Controller thread failed with exception:
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/ambari_agent/Controller.py", line 486, in run
    self.actionQueue = ActionQueue(self.config, controller=self)
  File "/usr/lib/python2.6/site-packages/ambari_agent/ActionQueue.py", line 79, in __init__
    self.statusCommandResultQueue = multiprocessing.Queue() # this queue is filled by StatuCommandsExecutor.
  File "/usr/lib64/python2.6/multiprocessing/__init__.py", line 213, in Queue
    return Queue(maxsize)
  File "/usr/lib64/python2.6/multiprocessing/queues.py", line 37, in __init__
    self._rlock = Lock()
  File "/usr/lib64/python2.6/multiprocessing/synchronize.py", line 117, in __init__
    SemLock.__init__(self, SEMAPHORE, 1, 1)
  File "/usr/lib64/python2.6/multiprocessing/synchronize.py", line 49, in __init__
    sl = self._semlock = _multiprocessing.SemLock(kind, value, maxvalue)
OSError: [Errno 13] Permission denied
ERROR 2017-12-08 18:33:40,530 main.py:477 - Exiting with exception:
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/ambari_agent/main.py", line 472, in <module>
    main(heartbeat_stop_callback)
  File "/usr/lib/python2.6/site-packages/ambari_agent/main.py", line 451, in main
    run_threads(server_hostname, heartbeat_stop_callback)
  File "/usr/lib/python2.6/site-packages/ambari_agent/main.py", line 341, in run_threads
    controller.get_status_commands_executor().kill("AGENT_STOPPED", can_relaunch=False)
AttributeError: 'NoneType' object has no attribute 'kill'
INFO 2017-12-08 18:33:40,532 ExitHelper.py:56 - Performing cleanup before exiting...
INFO 2017-12-08 18:33:40,532 threadpool.py:120 - Shutting down thread pool
INFO 2017-12-08 18:33:40,532 scheduler.py:606 - Scheduler has been shut down
INFO 2017-12-08 18:33:40,533 threadpool.py:58 - Started thread pool with 3 core threads and 20 maximum threads
INFO 2017-12-08 18:33:40,533 AlertSchedulerHandler.py:185 - [AlertScheduler] Stopped the alert scheduler.
INFO 2017-12-08 18:33:40,533 threadpool.py:120 - Shutting down thread pool
INFO 2017-12-08 18:33:40,550 Controller.py:151 - Server connection disconnected.

As mentioned all the sudo permissions listed here are present in the /etc/sudoers.d/ambari file

Should probably add that the above was when running in lxc containers using vagrant-lxc and seems like to be related to permissions required by the python scripts to shared memory (/dev/shm).

However on vmware this worked fine.

1 REPLY 1

Super Mentor

@Anshuman Mehta


It looks like a duplicate thread of : https://community.hortonworks.com/questions/152000/ambar-agent-non-root-disconnectes-from-ambari-ser...

Please close one of them

.

  File "/usr/lib/python2.6/site-packages/ambari_agent/main.py", line 341, in run_threads    controller.get_status_commands_executor().kill("AGENT_STOPPED", can_relaunch=False) AttributeError: 'NoneType' object has no attribute 'kill'

Based on the above error it looks like your ambari-agent might not be installed properly or it might be having some missing / old version of scripts.

.

So please check the ambari-agent binary version on the problematic host to find out if the agent verison is correct?

If needed then please reinstall the ambari agent.

# rpm -qa | grep ambari-agent
# yum clean all
# yum reinstall ambari-agent -y

.