Support Questions
Find answers, ask questions, and share your expertise

ambar-agent (non-root) disconnectes from ambari-server (non-root) (lxc instance)

Highlighted

ambar-agent (non-root) disconnectes from ambari-server (non-root) (lxc instance)

Expert Contributor

hi

am attempting to ansibilize the creation of a hdp cluster and have followed the documentation to setup ambari-server and ambari-agent to run as non-root users say ambari specifically

  1. Appropriate sudo permissions/commands as defined in the documentation were added to /etc/sudoers.d/ambari
  2. ambari-agent was installed and setup.
  3. change the configuration for hostname to the fqdn host and run_as_user=ambari user
  4. start the ambari-agent and it starts up successfully as the ambari user
  5. ambari-server setup was run as a script which took in the --jdbc and --database options for an external db
  6. the owner permissions on the /var/lib/ambari-server ; /var/log/ambari-server ; /etc/ambari-server to the non-root user ambari
  7. add the ambari-server.user=ambari in the /etc/ambari-server/conf/ambari.properties
  8. start ambari-server
  9. this makes ambari-server run as the ambari user

As soon as the ambari-agent detects the ambari-server it connects to it and then fails with the following stacktrace. It seems like a permission issue but cannot figure out what it is. FWIW if ambari-agent is run as root it works fine.

INFO 2017-12-08 18:33:38,479 PingPortListener.py:50 - Ping port listener started on port: 8670
INFO 2017-12-08 18:33:38,480 main.py:437 - Connecting to Ambari server at https://ambari-server.mydomain.com:8440 (192.168.20.20)
INFO 2017-12-08 18:33:38,480 NetUtil.py:70 - Connecting to https://ambari-server.mydomain.com:8440/ca
INFO 2017-12-08 18:33:38,526 main.py:447 - Connected to Ambari server ambari-server.mydomain.com
INFO 2017-12-08 18:33:38,526 hostname.py:67 - agent:hostname_script configuration not defined thus read hostname 'ambari-server.mydomain.com' using socket.getfqdn().INFO 2017-12-08 18:33:38,527 threadpool.py:58 - Started thread pool with 3 core threads and 20 maximum threads
WARNING 2017-12-08 18:33:38,527 AlertSchedulerHandler.py:280 - [AlertScheduler] /var/lib/ambari-agent/cache/alerts/definitions.json not found or invalid. No alerts will be scheduled until registration
 occurs.
INFO 2017-12-08 18:33:38,527 AlertSchedulerHandler.py:175 - [AlertScheduler] Starting <ambari_agent.apscheduler.scheduler.Scheduler object at 0x10e93d0>; currently running: False
ERROR 2017-12-08 18:33:38,528 Controller.py:506 - Controller thread failed with exception:
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/ambari_agent/Controller.py", line 486, in run
    self.actionQueue = ActionQueue(self.config, controller=self)
  File "/usr/lib/python2.6/site-packages/ambari_agent/ActionQueue.py", line 79, in __init__
    self.statusCommandResultQueue = multiprocessing.Queue() # this queue is filled by StatuCommandsExecutor.
  File "/usr/lib64/python2.6/multiprocessing/__init__.py", line 213, in Queue
    return Queue(maxsize)
  File "/usr/lib64/python2.6/multiprocessing/queues.py", line 37, in __init__
    self._rlock = Lock()
  File "/usr/lib64/python2.6/multiprocessing/synchronize.py", line 117, in __init__
    SemLock.__init__(self, SEMAPHORE, 1, 1)
  File "/usr/lib64/python2.6/multiprocessing/synchronize.py", line 49, in __init__
    sl = self._semlock = _multiprocessing.SemLock(kind, value, maxvalue)
OSError: [Errno 13] Permission denied
ERROR 2017-12-08 18:33:40,530 main.py:477 - Exiting with exception:
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/ambari_agent/main.py", line 472, in <module>
    main(heartbeat_stop_callback)
  File "/usr/lib/python2.6/site-packages/ambari_agent/main.py", line 451, in main
    run_threads(server_hostname, heartbeat_stop_callback)
  File "/usr/lib/python2.6/site-packages/ambari_agent/main.py", line 341, in run_threads
    controller.get_status_commands_executor().kill("AGENT_STOPPED", can_relaunch=False)
AttributeError: 'NoneType' object has no attribute 'kill'
INFO 2017-12-08 18:33:40,532 ExitHelper.py:56 - Performing cleanup before exiting...
INFO 2017-12-08 18:33:40,532 threadpool.py:120 - Shutting down thread pool
INFO 2017-12-08 18:33:40,532 scheduler.py:606 - Scheduler has been shut down
INFO 2017-12-08 18:33:40,533 threadpool.py:58 - Started thread pool with 3 core threads and 20 maximum threads
INFO 2017-12-08 18:33:40,533 AlertSchedulerHandler.py:185 - [AlertScheduler] Stopped the alert scheduler.
INFO 2017-12-08 18:33:40,533 threadpool.py:120 - Shutting down thread pool
INFO 2017-12-08 18:33:40,550 Controller.py:151 - Server connection disconnected.

As mentioned all the sudo permissions listed here are present in the /etc/sudoers.d/ambari file

Should probably add that the above was when running in lxc containers using vagrant-lxc and seems like to be related to permissions required by the python scripts to shared memory (/dev/shm).

However on vmware this worked fine.

2 REPLIES 2

Re: ambar-agent (non-root) disconnectes from ambari-server (non-root) (lxc instance)

Expert Contributor

Check your /etc/hosts file and make sure it has all host entries, also check that you have pulled the correct version of Ambari for your OS or did you pull CentOS 6 or 7. Mismatches can create problems with python, it doesn't look like you have that issue, but if you are using satellite this may happen if the wrong link is used. Lastly, check the firewalls.

Highlighted

Re: ambar-agent (non-root) disconnectes from ambari-server (non-root) (lxc instance)

Super Mentor


@Anshuman Mehta

  File "/usr/lib/python2.6/site-packages/ambari_agent/main.py", line 341, in run_threads    controller.get_status_commands_executor().kill("AGENT_STOPPED", can_relaunch=False) AttributeError: 'NoneType' object has no attribute 'kill'

Based on the above error it looks like your ambari-agent might not be installed properly or it might be having some missing / old version of scripts.

.

So please check the ambari-agent binary version on the problematic host to find out if the agent verison is correct?

If needed then please reinstall the ambari agent.

# rpm -qa | grep ambari-agent
# yum clean all
# yum reinstall ambari-agent -y

.