Support Questions

Find answers, ask questions, and share your expertise

Ambari-Agent registration step fails on RHEL 7.4 EC2 while trying to create HDF 3.2 cluster

avatar
Contributor

I've installed Ambari Server and followed all pre-requisite steps, When trying to create HDF 3.2 cluster via Ambari wizard, the Ambari Agent is installed and started but registration step fails:

Creating target directory...

==========================

Command start time 2018-09-10 23:54:57

Connection to ip-10-40-145-105.ec2.internal closed.

SSH command execution finished

host=ip-10-40-145-105.ec2.internal, exitcode=0

Command end time 2018-09-10 23:54:58

==========================

Copying ambari sudo script...

==========================

Command start time 2018-09-10 23:54:58

scp /var/lib/ambari-server/ambari-sudo.sh

host=ip-10-40-145-105.ec2.internal, exitcode=0

Command end time 2018-09-10 23:54:58

==========================

Copying common functions script...

==========================

Command start time 2018-09-10 23:54:58

scp /usr/lib/ambari-server/lib/ambari_commons

host=ip-10-40-145-105.ec2.internal, exitcode=0

Command end time 2018-09-10 23:54:58

==========================

Copying create-python-wrap script...

==========================

Command start time 2018-09-10 23:54:58

scp /var/lib/ambari-server/create-python-wrap.sh

host=ip-10-40-145-105.ec2.internal, exitcode=0

Command end time 2018-09-10 23:54:59

==========================

Copying OS type check script...

==========================

Command start time 2018-09-10 23:54:59

scp /usr/lib/ambari-server/lib/ambari_server/os_check_type.py

host=ip-10-40-145-105.ec2.internal, exitcode=0

Command end time 2018-09-10 23:54:59

==========================

Running create-python-wrap script...

==========================

Command start time 2018-09-10 23:54:59

Connection to ip-10-40-145-105.ec2.internal closed.

SSH command execution finished

host=ip-10-40-145-105.ec2.internal, exitcode=0

Command end time 2018-09-10 23:54:59

==========================

Running OS type check...

==========================

Command start time 2018-09-10 23:54:59

Cluster primary/cluster OS family is redhat7 and local/current OS family is redhat7

Connection to ip-10-40-145-105.ec2.internal closed.

SSH command execution finished

host=ip-10-40-145-105.ec2.internal, exitcode=0

Command end time 2018-09-10 23:54:59

==========================

Checking 'sudo' package on remote host...

==========================

Command start time 2018-09-10 23:54:59

Connection to ip-10-40-145-105.ec2.internal closed.

SSH command execution finished

host=ip-10-40-145-105.ec2.internal, exitcode=0

Command end time 2018-09-10 23:55:00

==========================

Copying repo file to 'tmp' folder...

==========================

Command start time 2018-09-10 23:55:00

scp /etc/yum.repos.d/ambari.repo

host=ip-10-40-145-105.ec2.internal, exitcode=0

Command end time 2018-09-10 23:55:00

==========================

Moving file to repo dir...

==========================

Command start time 2018-09-10 23:55:00

Connection to ip-10-40-145-105.ec2.internal closed.

SSH command execution finished

host=ip-10-40-145-105.ec2.internal, exitcode=0

Command end time 2018-09-10 23:55:00

==========================

Changing permissions for ambari.repo...

==========================

Command start time 2018-09-10 23:55:00

Connection to ip-10-40-145-105.ec2.internal closed.

SSH command execution finished

host=ip-10-40-145-105.ec2.internal, exitcode=0

Command end time 2018-09-10 23:55:01

==========================

Copying setup script file...

==========================

Command start time 2018-09-10 23:55:01

scp /usr/lib/ambari-server/lib/ambari_server/setupAgent.py

host=ip-10-40-145-105.ec2.internal, exitcode=0

Command end time 2018-09-10 23:55:01

==========================

Running setup agent script...

==========================

Command start time 2018-09-10 23:55:01

("WARNING 2018-09-10 23:55:35,248 shell.py:822 - can not switch user for RUN_COMMAND.

WARNING 2018-09-10 23:55:35,352 shell.py:822 - can not switch user for RUN_COMMAND.

INFO 2018-09-10 23:55:35,456 main.py:311 - Agent not going to die gracefully, going to execute kill -9

WARNING 2018-09-10 23:55:35,456 shell.py:822 - can not switch user for RUN_COMMAND.

INFO 2018-09-10 23:55:35,460 main.py:322 - Agent stopped successfully by kill -9, exiting.

INFO 2018-09-10 23:55:35,460 ExitHelper.py:57 - Performing cleanup before exiting...

INFO 2018-09-10 23:55:35,461 AlertSchedulerHandler.py:159 - [AlertScheduler] Stopped the alert scheduler.

INFO 2018-09-10 23:55:35,461 AlertSchedulerHandler.py:159 - [AlertScheduler] Stopped the alert scheduler.

INFO 2018-09-10 23:55:35,740 main.py:155 - loglevel=logging.INFO

INFO 2018-09-10 23:55:35,742 Hardware.py:68 - Initializing host system information.

INFO 2018-09-10 23:55:35,746 Hardware.py:188 - Some mount points were ignored: /dev, /dev/shm, /run, /sys/fs/cgroup, /run/user/1000, /run/user/0

INFO 2018-09-10 23:55:35,762 Facter.py:202 - Directory: '/etc/resource_overrides' does not exist - it won't be used for gathering system resources.

INFO 2018-09-10 23:55:35,765 Hardware.py:73 - Host system information: {'kernel': 'Linux', 'domain': 'ec2.internal', 'physicalprocessorcount': 8, 'kernelrelease': '3.10.0-693.el7.x86_64', 'uptime_days': '0', 'memorytotal': 31962140, 'swapfree': '0.00 GB', 'memorysize': 31962140, 'osfamily': 'redhat', 'swapsize': '0.00 GB', 'processorcount': 8, 'netmask': '255.255.255.128', 'timezone': 'UTC', 'hardwareisa': 'x86_64', 'memoryfree': 31314048, 'operatingsystem': 'redhat', 'kernelmajversion': '3.10', 'kernelversion': '3.10.0', 'macaddress': '0A:97:71:30:53:26', 'operatingsystemrelease': '7.4', 'ipaddress': '10.40.145.105', 'hostname': 'ip-10-40-145-105', 'uptime_hours': '0', 'fqdn': 'ip-10-40-145-105.ec2.internal', 'id': 'root', 'architecture': 'x86_64', 'selinux': True, 'mounts': [{'available': '19599720', 'used': '1359492', 'percent': '7%', 'device': '/dev/nvme0n1p2', 'mountpoint': '/', 'type': 'xfs', 'size': '20959212'}, {'available': '927944', 'used': '2564', 'percent': '1%', 'device': '/dev/nvme3n1', 'mountpoint': '/db-repo', 'type': 'ext4', 'size': '999320'}, {'available': '24299724', 'used': '45080', 'percent': '1%', 'device': '/dev/nvme2n1', 'mountpoint': '/provenance-repo', 'type': 'ext4', 'size': '25671908'}, {'available': '48783816', 'used': '53272', 'percent': '1%', 'device': '/dev/nvme1n1', 'mountpoint': '/nifi-logs', 'type': 'ext4', 'size': '51474912'}, {'available': '97760160', 'used': '61464', 'percent': '1%', 'device': '/dev/nvme4n1', 'mountpoint': '/content-repo', 'type': 'ext4', 'size': '103080888'}, {'available': '48783816', 'used': '53272', 'percent': '1%', 'device': '/dev/nvme5n1', 'mountpoint': '/flowfile-repo', 'type': 'ext4', 'size': '51474912'}], 'hardwaremodel': 'x86_64', 'uptime_seconds': '1176', 'interfaces': 'eth0,lo'}

INFO 2018-09-10 23:55:35,767 DataCleaner.py:39 - Data cleanup thread started

INFO 2018-09-10 23:55:35,768 DataCleaner.py:120 - Data cleanup started

INFO 2018-09-10 23:55:35,768 DataCleaner.py:122 - Data cleanup finished

INFO 2018-09-10 23:55:35,798 hostname.py:67 - agent:hostname_script configuration not defined thus read hostname 'ip-10-40-145-105.ec2.internal' using socket.getfqdn().

INFO 2018-09-10 23:55:35,803 PingPortListener.py:50 - Ping port listener started on port: 8670

INFO 2018-09-10 23:55:35,805 main.py:481 - Connecting to Ambari server at https://ip-10-40-145-25.ec2.internal:8440 (10.40.145.25)

INFO 2018-09-10 23:55:35,806 NetUtil.py:61 - Connecting to https://ip-10-40-145-25.ec2.internal:8440/ca

", None)

("WARNING 2018-09-10 23:55:35,248 shell.py:822 - can not switch user for RUN_COMMAND.

WARNING 2018-09-10 23:55:35,352 shell.py:822 - can not switch user for RUN_COMMAND.

INFO 2018-09-10 23:55:35,456 main.py:311 - Agent not going to die gracefully, going to execute kill -9

WARNING 2018-09-10 23:55:35,456 shell.py:822 - can not switch user for RUN_COMMAND.

INFO 2018-09-10 23:55:35,460 main.py:322 - Agent stopped successfully by kill -9, exiting.

INFO 2018-09-10 23:55:35,460 ExitHelper.py:57 - Performing cleanup before exiting...

INFO 2018-09-10 23:55:35,461 AlertSchedulerHandler.py:159 - [AlertScheduler] Stopped the alert scheduler.

INFO 2018-09-10 23:55:35,461 AlertSchedulerHandler.py:159 - [AlertScheduler] Stopped the alert scheduler.

INFO 2018-09-10 23:55:35,740 main.py:155 - loglevel=logging.INFO

INFO 2018-09-10 23:55:35,742 Hardware.py:68 - Initializing host system information.

INFO 2018-09-10 23:55:35,746 Hardware.py:188 - Some mount points were ignored: /dev, /dev/shm, /run, /sys/fs/cgroup, /run/user/1000, /run/user/0

INFO 2018-09-10 23:55:35,762 Facter.py:202 - Directory: '/etc/resource_overrides' does not exist - it won't be used for gathering system resources.

INFO 2018-09-10 23:55:35,765 Hardware.py:73 - Host system information: {'kernel': 'Linux', 'domain': 'ec2.internal', 'physicalprocessorcount': 8, 'kernelrelease': '3.10.0-693.el7.x86_64', 'uptime_days': '0', 'memorytotal': 31962140, 'swapfree': '0.00 GB', 'memorysize': 31962140, 'osfamily': 'redhat', 'swapsize': '0.00 GB', 'processorcount': 8, 'netmask': '255.255.255.128', 'timezone': 'UTC', 'hardwareisa': 'x86_64', 'memoryfree': 31314048, 'operatingsystem': 'redhat', 'kernelmajversion': '3.10', 'kernelversion': '3.10.0', 'macaddress': '0A:97:71:30:53:26', 'operatingsystemrelease': '7.4', 'ipaddress': '10.40.145.105', 'hostname': 'ip-10-40-145-105', 'uptime_hours': '0', 'fqdn': 'ip-10-40-145-105.ec2.internal', 'id': 'root', 'architecture': 'x86_64', 'selinux': True, 'mounts': [{'available': '19599720', 'used': '1359492', 'percent': '7%', 'device': '/dev/nvme0n1p2', 'mountpoint': '/', 'type': 'xfs', 'size': '20959212'}, {'available': '927944', 'used': '2564', 'percent': '1%', 'device': '/dev/nvme3n1', 'mountpoint': '/db-repo', 'type': 'ext4', 'size': '999320'}, {'available': '24299724', 'used': '45080', 'percent': '1%', 'device': '/dev/nvme2n1', 'mountpoint': '/provenance-repo', 'type': 'ext4', 'size': '25671908'}, {'available': '48783816', 'used': '53272', 'percent': '1%', 'device': '/dev/nvme1n1', 'mountpoint': '/nifi-logs', 'type': 'ext4', 'size': '51474912'}, {'available': '97760160', 'used': '61464', 'percent': '1%', 'device': '/dev/nvme4n1', 'mountpoint': '/content-repo', 'type': 'ext4', 'size': '103080888'}, {'available': '48783816', 'used': '53272', 'percent': '1%', 'device': '/dev/nvme5n1', 'mountpoint': '/flowfile-repo', 'type': 'ext4', 'size': '51474912'}], 'hardwaremodel': 'x86_64', 'uptime_seconds': '1176', 'interfaces': 'eth0,lo'}

INFO 2018-09-10 23:55:35,767 DataCleaner.py:39 - Data cleanup thread started

INFO 2018-09-10 23:55:35,768 DataCleaner.py:120 - Data cleanup started

INFO 2018-09-10 23:55:35,768 DataCleaner.py:122 - Data cleanup finished

INFO 2018-09-10 23:55:35,798 hostname.py:67 - agent:hostname_script configuration not defined thus read hostname 'ip-10-40-145-105.ec2.internal' using socket.getfqdn().

INFO 2018-09-10 23:55:35,803 PingPortListener.py:50 - Ping port listener started on port: 8670

INFO 2018-09-10 23:55:35,805 main.py:481 - Connecting to Ambari server at https://ip-10-40-145-25.ec2.internal:8440 (10.40.145.25)

INFO 2018-09-10 23:55:35,806 NetUtil.py:61 - Connecting to https://ip-10-40-145-25.ec2.internal:8440/ca

", None)

Connection to ip-10-40-145-105.ec2.internal closed.

SSH command execution finished

host=ip-10-40-145-105.ec2.internal, exitcode=0

Command end time 2018-09-10 23:55:38

Registering with the server...

Registration with the server failed.

1 ACCEPTED SOLUTION

avatar

Hi @Alex M,

Referring to your error log it seems your agent registration is failing due to below error message :

("WARNING 2018-09-10 23:55:35,248 shell.py:822 - can not switch user for RUN_COMMAND.
WARNING 2018-09-10 23:55:35,352 shell.py:822 - can not switch user for RUN_COMMAND.
INFO 2018-09-10 23:55:35,456 main.py:311 - Agent not going to die gracefully, going to execute kill -9
WARNING 2018-09-10 23:55:35,456 shell.py:822 - can not switch user for RUN_COMMAND.

Are you installing ambari as non-root user , have you given proper permissions as neccessary ?

Also can you please try to install and register ambari-agent mannually : https://docs.hortonworks.com/HDPDocuments/Ambari-2.6.2.0/bk_ambari-administration/content/install_th...

and proceed using the add-host wizard.

Also i assume your node can do password less SSH to ip-10-40-145-25.ec2.internal which is your ambari-server ip. (even if its same host its required to have publickey added to authorized_keys )

cat id_rsa.pub >> authorized_keys

refer to : https://docs.hortonworks.com/HDPDocuments/Ambari-2.7.0.0/bk_ambari-installation/content/set_up_passw...

Please see if this helps you, please login and accept this answer if it did.

View solution in original post

3 REPLIES 3

avatar

Hi @Alex M,

Referring to your error log it seems your agent registration is failing due to below error message :

("WARNING 2018-09-10 23:55:35,248 shell.py:822 - can not switch user for RUN_COMMAND.
WARNING 2018-09-10 23:55:35,352 shell.py:822 - can not switch user for RUN_COMMAND.
INFO 2018-09-10 23:55:35,456 main.py:311 - Agent not going to die gracefully, going to execute kill -9
WARNING 2018-09-10 23:55:35,456 shell.py:822 - can not switch user for RUN_COMMAND.

Are you installing ambari as non-root user , have you given proper permissions as neccessary ?

Also can you please try to install and register ambari-agent mannually : https://docs.hortonworks.com/HDPDocuments/Ambari-2.6.2.0/bk_ambari-administration/content/install_th...

and proceed using the add-host wizard.

Also i assume your node can do password less SSH to ip-10-40-145-25.ec2.internal which is your ambari-server ip. (even if its same host its required to have publickey added to authorized_keys )

cat id_rsa.pub >> authorized_keys

refer to : https://docs.hortonworks.com/HDPDocuments/Ambari-2.7.0.0/bk_ambari-installation/content/set_up_passw...

Please see if this helps you, please login and accept this answer if it did.

avatar
Master Mentor

@Alex M

Have you tried running the command with sudo, are you running the installation like another user apart from root?

avatar
Contributor
@Akhil S Naik

Thank you - running "cat id_rsa.pub >> authorized_keys" on Ambari Server did the trick