Created 09-11-2018 02:13 PM
I've installed Ambari Server and followed all pre-requisite steps, When trying to create HDF 3.2 cluster via Ambari wizard, the Ambari Agent is installed and started but registration step fails:
Creating target directory...
==========================
Command start time 2018-09-10 23:54:57
Connection to ip-10-40-145-105.ec2.internal closed.
SSH command execution finished
host=ip-10-40-145-105.ec2.internal, exitcode=0
Command end time 2018-09-10 23:54:58
==========================
Copying ambari sudo script...
==========================
Command start time 2018-09-10 23:54:58
scp /var/lib/ambari-server/ambari-sudo.sh
host=ip-10-40-145-105.ec2.internal, exitcode=0
Command end time 2018-09-10 23:54:58
==========================
Copying common functions script...
==========================
Command start time 2018-09-10 23:54:58
scp /usr/lib/ambari-server/lib/ambari_commons
host=ip-10-40-145-105.ec2.internal, exitcode=0
Command end time 2018-09-10 23:54:58
==========================
Copying create-python-wrap script...
==========================
Command start time 2018-09-10 23:54:58
scp /var/lib/ambari-server/create-python-wrap.sh
host=ip-10-40-145-105.ec2.internal, exitcode=0
Command end time 2018-09-10 23:54:59
==========================
Copying OS type check script...
==========================
Command start time 2018-09-10 23:54:59
scp /usr/lib/ambari-server/lib/ambari_server/os_check_type.py
host=ip-10-40-145-105.ec2.internal, exitcode=0
Command end time 2018-09-10 23:54:59
==========================
Running create-python-wrap script...
==========================
Command start time 2018-09-10 23:54:59
Connection to ip-10-40-145-105.ec2.internal closed.
SSH command execution finished
host=ip-10-40-145-105.ec2.internal, exitcode=0
Command end time 2018-09-10 23:54:59
==========================
Running OS type check...
==========================
Command start time 2018-09-10 23:54:59
Cluster primary/cluster OS family is redhat7 and local/current OS family is redhat7
Connection to ip-10-40-145-105.ec2.internal closed.
SSH command execution finished
host=ip-10-40-145-105.ec2.internal, exitcode=0
Command end time 2018-09-10 23:54:59
==========================
Checking 'sudo' package on remote host...
==========================
Command start time 2018-09-10 23:54:59
Connection to ip-10-40-145-105.ec2.internal closed.
SSH command execution finished
host=ip-10-40-145-105.ec2.internal, exitcode=0
Command end time 2018-09-10 23:55:00
==========================
Copying repo file to 'tmp' folder...
==========================
Command start time 2018-09-10 23:55:00
scp /etc/yum.repos.d/ambari.repo
host=ip-10-40-145-105.ec2.internal, exitcode=0
Command end time 2018-09-10 23:55:00
==========================
Moving file to repo dir...
==========================
Command start time 2018-09-10 23:55:00
Connection to ip-10-40-145-105.ec2.internal closed.
SSH command execution finished
host=ip-10-40-145-105.ec2.internal, exitcode=0
Command end time 2018-09-10 23:55:00
==========================
Changing permissions for ambari.repo...
==========================
Command start time 2018-09-10 23:55:00
Connection to ip-10-40-145-105.ec2.internal closed.
SSH command execution finished
host=ip-10-40-145-105.ec2.internal, exitcode=0
Command end time 2018-09-10 23:55:01
==========================
Copying setup script file...
==========================
Command start time 2018-09-10 23:55:01
scp /usr/lib/ambari-server/lib/ambari_server/setupAgent.py
host=ip-10-40-145-105.ec2.internal, exitcode=0
Command end time 2018-09-10 23:55:01
==========================
Running setup agent script...
==========================
Command start time 2018-09-10 23:55:01
("WARNING 2018-09-10 23:55:35,248 shell.py:822 - can not switch user for RUN_COMMAND.
WARNING 2018-09-10 23:55:35,352 shell.py:822 - can not switch user for RUN_COMMAND.
INFO 2018-09-10 23:55:35,456 main.py:311 - Agent not going to die gracefully, going to execute kill -9
WARNING 2018-09-10 23:55:35,456 shell.py:822 - can not switch user for RUN_COMMAND.
INFO 2018-09-10 23:55:35,460 main.py:322 - Agent stopped successfully by kill -9, exiting.
INFO 2018-09-10 23:55:35,460 ExitHelper.py:57 - Performing cleanup before exiting...
INFO 2018-09-10 23:55:35,461 AlertSchedulerHandler.py:159 - [AlertScheduler] Stopped the alert scheduler.
INFO 2018-09-10 23:55:35,461 AlertSchedulerHandler.py:159 - [AlertScheduler] Stopped the alert scheduler.
INFO 2018-09-10 23:55:35,740 main.py:155 - loglevel=logging.INFO
INFO 2018-09-10 23:55:35,742 Hardware.py:68 - Initializing host system information.
INFO 2018-09-10 23:55:35,746 Hardware.py:188 - Some mount points were ignored: /dev, /dev/shm, /run, /sys/fs/cgroup, /run/user/1000, /run/user/0
INFO 2018-09-10 23:55:35,762 Facter.py:202 - Directory: '/etc/resource_overrides' does not exist - it won't be used for gathering system resources.
INFO 2018-09-10 23:55:35,765 Hardware.py:73 - Host system information: {'kernel': 'Linux', 'domain': 'ec2.internal', 'physicalprocessorcount': 8, 'kernelrelease': '3.10.0-693.el7.x86_64', 'uptime_days': '0', 'memorytotal': 31962140, 'swapfree': '0.00 GB', 'memorysize': 31962140, 'osfamily': 'redhat', 'swapsize': '0.00 GB', 'processorcount': 8, 'netmask': '255.255.255.128', 'timezone': 'UTC', 'hardwareisa': 'x86_64', 'memoryfree': 31314048, 'operatingsystem': 'redhat', 'kernelmajversion': '3.10', 'kernelversion': '3.10.0', 'macaddress': '0A:97:71:30:53:26', 'operatingsystemrelease': '7.4', 'ipaddress': '10.40.145.105', 'hostname': 'ip-10-40-145-105', 'uptime_hours': '0', 'fqdn': 'ip-10-40-145-105.ec2.internal', 'id': 'root', 'architecture': 'x86_64', 'selinux': True, 'mounts': [{'available': '19599720', 'used': '1359492', 'percent': '7%', 'device': '/dev/nvme0n1p2', 'mountpoint': '/', 'type': 'xfs', 'size': '20959212'}, {'available': '927944', 'used': '2564', 'percent': '1%', 'device': '/dev/nvme3n1', 'mountpoint': '/db-repo', 'type': 'ext4', 'size': '999320'}, {'available': '24299724', 'used': '45080', 'percent': '1%', 'device': '/dev/nvme2n1', 'mountpoint': '/provenance-repo', 'type': 'ext4', 'size': '25671908'}, {'available': '48783816', 'used': '53272', 'percent': '1%', 'device': '/dev/nvme1n1', 'mountpoint': '/nifi-logs', 'type': 'ext4', 'size': '51474912'}, {'available': '97760160', 'used': '61464', 'percent': '1%', 'device': '/dev/nvme4n1', 'mountpoint': '/content-repo', 'type': 'ext4', 'size': '103080888'}, {'available': '48783816', 'used': '53272', 'percent': '1%', 'device': '/dev/nvme5n1', 'mountpoint': '/flowfile-repo', 'type': 'ext4', 'size': '51474912'}], 'hardwaremodel': 'x86_64', 'uptime_seconds': '1176', 'interfaces': 'eth0,lo'}
INFO 2018-09-10 23:55:35,767 DataCleaner.py:39 - Data cleanup thread started
INFO 2018-09-10 23:55:35,768 DataCleaner.py:120 - Data cleanup started
INFO 2018-09-10 23:55:35,768 DataCleaner.py:122 - Data cleanup finished
INFO 2018-09-10 23:55:35,798 hostname.py:67 - agent:hostname_script configuration not defined thus read hostname 'ip-10-40-145-105.ec2.internal' using socket.getfqdn().
INFO 2018-09-10 23:55:35,803 PingPortListener.py:50 - Ping port listener started on port: 8670
INFO 2018-09-10 23:55:35,805 main.py:481 - Connecting to Ambari server at https://ip-10-40-145-25.ec2.internal:8440 (10.40.145.25)
INFO 2018-09-10 23:55:35,806 NetUtil.py:61 - Connecting to https://ip-10-40-145-25.ec2.internal:8440/ca
", None)
("WARNING 2018-09-10 23:55:35,248 shell.py:822 - can not switch user for RUN_COMMAND.
WARNING 2018-09-10 23:55:35,352 shell.py:822 - can not switch user for RUN_COMMAND.
INFO 2018-09-10 23:55:35,456 main.py:311 - Agent not going to die gracefully, going to execute kill -9
WARNING 2018-09-10 23:55:35,456 shell.py:822 - can not switch user for RUN_COMMAND.
INFO 2018-09-10 23:55:35,460 main.py:322 - Agent stopped successfully by kill -9, exiting.
INFO 2018-09-10 23:55:35,460 ExitHelper.py:57 - Performing cleanup before exiting...
INFO 2018-09-10 23:55:35,461 AlertSchedulerHandler.py:159 - [AlertScheduler] Stopped the alert scheduler.
INFO 2018-09-10 23:55:35,461 AlertSchedulerHandler.py:159 - [AlertScheduler] Stopped the alert scheduler.
INFO 2018-09-10 23:55:35,740 main.py:155 - loglevel=logging.INFO
INFO 2018-09-10 23:55:35,742 Hardware.py:68 - Initializing host system information.
INFO 2018-09-10 23:55:35,746 Hardware.py:188 - Some mount points were ignored: /dev, /dev/shm, /run, /sys/fs/cgroup, /run/user/1000, /run/user/0
INFO 2018-09-10 23:55:35,762 Facter.py:202 - Directory: '/etc/resource_overrides' does not exist - it won't be used for gathering system resources.
INFO 2018-09-10 23:55:35,765 Hardware.py:73 - Host system information: {'kernel': 'Linux', 'domain': 'ec2.internal', 'physicalprocessorcount': 8, 'kernelrelease': '3.10.0-693.el7.x86_64', 'uptime_days': '0', 'memorytotal': 31962140, 'swapfree': '0.00 GB', 'memorysize': 31962140, 'osfamily': 'redhat', 'swapsize': '0.00 GB', 'processorcount': 8, 'netmask': '255.255.255.128', 'timezone': 'UTC', 'hardwareisa': 'x86_64', 'memoryfree': 31314048, 'operatingsystem': 'redhat', 'kernelmajversion': '3.10', 'kernelversion': '3.10.0', 'macaddress': '0A:97:71:30:53:26', 'operatingsystemrelease': '7.4', 'ipaddress': '10.40.145.105', 'hostname': 'ip-10-40-145-105', 'uptime_hours': '0', 'fqdn': 'ip-10-40-145-105.ec2.internal', 'id': 'root', 'architecture': 'x86_64', 'selinux': True, 'mounts': [{'available': '19599720', 'used': '1359492', 'percent': '7%', 'device': '/dev/nvme0n1p2', 'mountpoint': '/', 'type': 'xfs', 'size': '20959212'}, {'available': '927944', 'used': '2564', 'percent': '1%', 'device': '/dev/nvme3n1', 'mountpoint': '/db-repo', 'type': 'ext4', 'size': '999320'}, {'available': '24299724', 'used': '45080', 'percent': '1%', 'device': '/dev/nvme2n1', 'mountpoint': '/provenance-repo', 'type': 'ext4', 'size': '25671908'}, {'available': '48783816', 'used': '53272', 'percent': '1%', 'device': '/dev/nvme1n1', 'mountpoint': '/nifi-logs', 'type': 'ext4', 'size': '51474912'}, {'available': '97760160', 'used': '61464', 'percent': '1%', 'device': '/dev/nvme4n1', 'mountpoint': '/content-repo', 'type': 'ext4', 'size': '103080888'}, {'available': '48783816', 'used': '53272', 'percent': '1%', 'device': '/dev/nvme5n1', 'mountpoint': '/flowfile-repo', 'type': 'ext4', 'size': '51474912'}], 'hardwaremodel': 'x86_64', 'uptime_seconds': '1176', 'interfaces': 'eth0,lo'}
INFO 2018-09-10 23:55:35,767 DataCleaner.py:39 - Data cleanup thread started
INFO 2018-09-10 23:55:35,768 DataCleaner.py:120 - Data cleanup started
INFO 2018-09-10 23:55:35,768 DataCleaner.py:122 - Data cleanup finished
INFO 2018-09-10 23:55:35,798 hostname.py:67 - agent:hostname_script configuration not defined thus read hostname 'ip-10-40-145-105.ec2.internal' using socket.getfqdn().
INFO 2018-09-10 23:55:35,803 PingPortListener.py:50 - Ping port listener started on port: 8670
INFO 2018-09-10 23:55:35,805 main.py:481 - Connecting to Ambari server at https://ip-10-40-145-25.ec2.internal:8440 (10.40.145.25)
INFO 2018-09-10 23:55:35,806 NetUtil.py:61 - Connecting to https://ip-10-40-145-25.ec2.internal:8440/ca
", None)
Connection to ip-10-40-145-105.ec2.internal closed.
SSH command execution finished
host=ip-10-40-145-105.ec2.internal, exitcode=0
Command end time 2018-09-10 23:55:38
Registering with the server...
Registration with the server failed.
Created 09-11-2018 02:56 PM
Hi @Alex M,
Referring to your error log it seems your agent registration is failing due to below error message :
("WARNING 2018-09-10 23:55:35,248 shell.py:822 - can not switch user for RUN_COMMAND. WARNING 2018-09-10 23:55:35,352 shell.py:822 - can not switch user for RUN_COMMAND. INFO 2018-09-10 23:55:35,456 main.py:311 - Agent not going to die gracefully, going to execute kill -9 WARNING 2018-09-10 23:55:35,456 shell.py:822 - can not switch user for RUN_COMMAND.
Are you installing ambari as non-root user , have you given proper permissions as neccessary ?
Also can you please try to install and register ambari-agent mannually : https://docs.hortonworks.com/HDPDocuments/Ambari-2.6.2.0/bk_ambari-administration/content/install_th...
and proceed using the add-host wizard.
Also i assume your node can do password less SSH to ip-10-40-145-25.ec2.internal which is your ambari-server ip. (even if its same host its required to have publickey added to authorized_keys )
cat id_rsa.pub >> authorized_keys
Please see if this helps you, please login and accept this answer if it did.
Created 09-11-2018 02:56 PM
Hi @Alex M,
Referring to your error log it seems your agent registration is failing due to below error message :
("WARNING 2018-09-10 23:55:35,248 shell.py:822 - can not switch user for RUN_COMMAND. WARNING 2018-09-10 23:55:35,352 shell.py:822 - can not switch user for RUN_COMMAND. INFO 2018-09-10 23:55:35,456 main.py:311 - Agent not going to die gracefully, going to execute kill -9 WARNING 2018-09-10 23:55:35,456 shell.py:822 - can not switch user for RUN_COMMAND.
Are you installing ambari as non-root user , have you given proper permissions as neccessary ?
Also can you please try to install and register ambari-agent mannually : https://docs.hortonworks.com/HDPDocuments/Ambari-2.6.2.0/bk_ambari-administration/content/install_th...
and proceed using the add-host wizard.
Also i assume your node can do password less SSH to ip-10-40-145-25.ec2.internal which is your ambari-server ip. (even if its same host its required to have publickey added to authorized_keys )
cat id_rsa.pub >> authorized_keys
Please see if this helps you, please login and accept this answer if it did.
Created 09-11-2018 05:18 PM
Have you tried running the command with sudo, are you running the installation like another user apart from root?
Created 09-12-2018 12:43 AM
Thank you - running "cat id_rsa.pub >> authorized_keys" on Ambari Server did the trick