Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

unable to join Amabari server to cluster

Explorer

step 1 get rsa key from umbari host

	# cat /root/.ssh/id_rsa

step 2 Amabri web UI choose Hosts...Action Add Host. enter ambari.example.com in Target Hosts Field

Paste the key from step 1 into where it asks for 'ssh prv key'

step 3 click button Register and confirm....

returns Failed

		==========================
Creating target directory...
==========================
Command start time 2019-02-05 14:23:08
root@ambari.example.com: Permission denied (publickey,password).

		SSH command execution finished
host=ambari.example.com, exitcode=255

		Command end time 2019-02-05 14:23:08

ERROR: Bootstrap of host ambari.example.com fails because previous action finished with non-zero exit code (255)

		ERROR MESSAGE: root@ambari.example.com: Permission denied (publickey,password).

STDOUT: 
root@ambari.example.com: Permission denied (publickey,password).
	
OK
1 ACCEPTED SOLUTION

Super Mentor

@Tom Burke

Are you really checking what kind of command aere you executing in curl and what does every curl argument means?

Before attaching the "hosts4.json" if you would have just checked the content of this file then you would know that the credentials which you are entering in curl command are Wrong ambari admin credential.

# cat hosts4.json 
{
  "status": 403,
  "message": "Unable to sign in. Invalid username/password combination."
}

.

As i do not know your ambari admin credentials hence i just gave you a dummy curl command and expected that you will change the values according to your cluster.

View solution in original post

29 REPLIES 29

Super Mentor

@Tom Burke

1. Please make sure that the permission on these files are correct as following:

# ls -lart /root/.ssh/
total 20
-rw-r--r--. 1 root root  409 Jul 21  2018 id_rsa.pub
-rw-------. 1 root root 1679 Jul 21  2018 id_rsa

.

2. Also please make sure that the id_rsa files are generated on the host after fixing the hostname. (if the host was having multiple hostname earlier then please regenerate the keys again)

Also make sure to copy this public key to the mentioned hosts (and all cluster hosts)

# ssh-copy-id -i ~/.ssh/id_rsa.pub root@ambari.example.com

.

3. Also for a quick check if the passwordless SSH is setup already then you should be able to do SSH without entering password nexttime.

# ssh root@ambari.example.com


Explorer

Hi Jay

I was missing the ssh-copy-id -i ~/.ssh/id_rsa.pub root@ambari.example.com

I did the above and tested OK passwd-less root login to the ambari box.

Returned Failed but now with... ( thanks again!)

==========================
Creating target directory...
==========================


Command start time 2019-02-05 15:14:41
chmod: cannot access '/var/lib/ambari-agent/data': No such file or directory


Connection to ambari.example.com closed.
SSH command execution finished
host=ambari.example.com, exitcode=0
Command end time 2019-02-05 15:14:42


==========================
Copying ambari sudo script...
==========================


Command start time 2019-02-05 15:14:42


scp /var/lib/ambari-server/ambari-sudo.sh
host=ambari.example.com, exitcode=0
Command end time 2019-02-05 15:14:42


==========================
Copying common functions script...
==========================


Command start time 2019-02-05 15:14:42


scp /usr/lib/ambari-server/lib/ambari_commons
host=ambari.example.com, exitcode=0
Command end time 2019-02-05 15:14:43


==========================
Copying create-python-wrap script...
==========================


Command start time 2019-02-05 15:14:43


scp /var/lib/ambari-server/create-python-wrap.sh
host=ambari.example.com, exitcode=0
Command end time 2019-02-05 15:14:43


==========================
Copying OS type check script...
==========================


Command start time 2019-02-05 15:14:43


scp /usr/lib/ambari-server/lib/ambari_server/os_check_type.py
host=ambari.example.com, exitcode=0
Command end time 2019-02-05 15:14:44


==========================
Running create-python-wrap script...
==========================


Command start time 2019-02-05 15:14:44


Connection to ambari.example.com closed.
SSH command execution finished
host=ambari.example.com, exitcode=0
Command end time 2019-02-05 15:14:45


==========================
Running OS type check...
==========================


Command start time 2019-02-05 15:14:45
Cluster primary/cluster OS family is ubuntu18 and local/current OS family is ubuntu18


Connection to ambari.example.com closed.
SSH command execution finished
host=ambari.example.com, exitcode=0
Command end time 2019-02-05 15:14:45


==========================
Checking 'sudo' package on remote host...
==========================


Command start time 2019-02-05 15:14:45


Connection to ambari.example.com closed.
SSH command execution finished
host=ambari.example.com, exitcode=0
Command end time 2019-02-05 15:14:46


==========================
Copying repo file to 'tmp' folder...
==========================


Command start time 2019-02-05 15:14:46


scp /etc/apt/sources.list.d/ambari.list
host=ambari.example.com, exitcode=0
Command end time 2019-02-05 15:14:46


==========================
Moving file to repo dir...
==========================


Command start time 2019-02-05 15:14:46


Connection to ambari.example.com closed.
SSH command execution finished
host=ambari.example.com, exitcode=0
Command end time 2019-02-05 15:14:47


==========================
Changing permissions for ambari.repo...
==========================


Command start time 2019-02-05 15:14:47


Connection to ambari.example.com closed.
SSH command execution finished
host=ambari.example.com, exitcode=0
Command end time 2019-02-05 15:14:47


==========================
Update apt cache of repository...
==========================


Command start time 2019-02-05 15:14:47


0% [Working]
0% [Working]
            
Get:1 http://public-repo-1.hortonworks.com/ambari/ubuntu14/2.x/updates/2.2.1.0 Ambari InRelease [3,187 B]


0% [1 InRelease 3,187 B/3,187 B 100%]
                                     

Super Mentor

@Tom Burke

This time i do not see it is failing with the the previous error.

But it is strange to see that you are planning to use too old Ambari 2.2.1.0 as i see your output as following:

Get:1 http://public-repo-1.hortonworks.com/ambari/ubuntu14/2.x/updates/2.2.1.0 Ambari InRelease [3,187 B]

.

Is there any specific reason for choosing so old Ambari version?

Latest ambari release is 2.7.3

.

Explorer

Hi Jay - No particular reason. I inherited the cluster install that was made from a recent download l( like 3 weeks ago from your site ) so I assumed pretty new.

Ambari machine still does not show in Hosts?

Explorer

Could our problem be this old version?

Super Mentor

@Tom Burke

Can you also please check on the agent host to know why it is referring to Old ambari version 2.2.1.0?

May be on the problematic host you can try running this command to find out which version is it using?

# dpkg --list | grep ambari

.

Explorer

ambari-server 2.7.3.0-139 amd64 Ambari Server

Super Mentor

@Tom Burke

i meant to say on the host where the Agent registration is failing can you check from where Ambari 2.2.1.0 binaries are coming because we saw in your output as :

http://public-repo-1.hortonworks.com/ambari/ubuntu14/2.x/updates/2.2.1.0 Ambari 

.

Or can you please share the "ambari-agent.log" from the host where the agent setup is failing?

"/var/log/ambari-agent/ambari-agent.log"

Explorer

The host I cannot join to cluster is the Ambari server itself , I thought this was needed fro my reading of docs for the cluster in order to run the enable kerberos wizard. But seems not needed from your previous comment. I can run the command regardless , here are returns form one member compute1

root@compute1:~# dpkg --list | grep ambari

ii ambari-agent 2.7.3.0-139 amd64 Ambari Agent

ii ambari-infra-solr-client 2.7.3.0-139 amd64 [[description]]

ii ambari-metrics-assembly 2.7.3.0-139 amd64 Ambari Metrics Assembly

Super Mentor

@Tom Burke

Looks like ambari agent is already installed on your ambari server host. So just try to start it and then see it it is starting fine without any error.

# ambari-agent start;  tail -f /var/log/ambari-agent/ambari-agent.log

.

Explorer

Hello Jay log file from compute1

INFO 2019-02-05 16:14:41,688 security.py:135 - Event to server at /reports/host_status (correlation_id=74643): {'agentEnv': {'transparentHugePage': 'madvise', 'hostHealth': {'agentTimeStampAtReporting': 1549412081679, 'liveServices': [{'status': 'Healthy', 'name': 'ntp or chrony', 'desc': ''}]}, 'reverseLookup': True, 'umask': '18', 'hasUnlimitedJcePolicy': True, 'alternatives': [], 'firewallName': 'ufw', 'stackFoldersAndFiles': [], 'existingUsers': [], 'firewallRunning': False}, 'mounts': [{'available': '5028752436', 'used': '28126224', 'percent': '1%', 'device': '/dev/sda2', 'mountpoint': '/', 'type': 'ext4', 'size': '5325330344'}]}
INFO 2019-02-05 16:14:41,690 __init__.py:82 - Event from server at /user/ (correlation_id=74643): {u'status': u'OK'}
INFO 2019-02-05 16:14:50,711 security.py:135 - Event to server at /heartbeat (correlation_id=74644): {'id': 59869}
INFO 2019-02-05 16:14:50,713 __init__.py:82 - Event from server at /user/ (correlation_id=74644): {u'status': u'OK', u'id': 59870}
INFO 2019-02-05 16:15:00,715 security.py:135 - Event to server at /heartbeat (correlation_id=74645): {'id': 59870}
INFO 2019-02-05 16:15:00,718 __init__.py:82 - Event from server at /user/ (correlation_id=74645): {u'status': u'OK', u'id': 59871}
INFO 2019-02-05 16:15:10,719 security.py:135 - Event to server at /heartbeat (correlation_id=74646): {'id': 59871}
INFO 2019-02-05 16:15:10,724 __init__.py:82 - Event from server at /user/ (correlation_id=74646): {u'status': u'OK', u'id': 59872}
INFO 2019-02-05 16:15:20,729 security.py:135 - Event to server at /heartbeat (correlation_id=74647): {'id': 59872}
INFO 2019-02-05 16:15:20,731 __init__.py:82 - Event from server at /user/ (correlation_id=74647): {u'status': u'OK', u'id': 59873}
INFO 2019-02-05 16:15:30,733 security.py:135 - Event to server at /heartbeat (correlation_id=74648): {'id': 59873}
INFO 2019-02-05 16:15:30,734 __init__.py:82 - Event from server at /user/ (correlation_id=74648): {u'status': u'OK', u'id': 59874}
INFO 2019-02-05 16:15:40,735 security.py:135 - Event to server at /heartbeat (correlation_id=74649): {'id': 59874}
INFO 2019-02-05 16:15:40,736 __init__.py:82 - Event from server at /user/ (correlation_id=74649): {u'status': u'OK', u'id': 59875}
INFO 2019-02-05 16:15:41,938 Hardware.py:188 - Some mount points were ignored: /dev, /run, /dev/shm, /run/lock, /sys/fs/cgroup, /snap/core/6130, /run/user/1003, /snap/core/6259, /snap/core/6350, /run/user/1008, /run/user/1019, /run/user/1013, /run/user/1021, /run/user/1015, /run/user/1023, /run/user/0
INFO 2019-02-05 16:15:41,938 security.py:135 - Event to server at /reports/host_status (correlation_id=74650): {'agentEnv': {'transparentHugePage': 'madvise', 'hostHealth': {'agentTimeStampAtReporting': 1549412141929, 'liveServices': [{'status': 'Healthy', 'name': 'ntp or chrony', 'desc': ''}]}, 'reverseLookup': True, 'umask': '18', 'hasUnlimitedJcePolicy': True, 'alternatives': [], 'firewallName': 'ufw', 'stackFoldersAndFiles': [], 'existingUsers': [], 'firewallRunning': False}, 'mounts': [{'available': '5028752424', 'used': '28126236', 'percent': '1%', 'device': '/dev/sda2', 'mountpoint': '/', 'type': 'ext4', 'size': '5325330344'}]}
INFO 2019-02-05 16:15:41,940 __init__.py:82 - Event from server at /user/ (correlation_id=74650): {u'status': u'OK'}
INFO 2019-02-05 16:15:50,737 security.py:135 - Event to server at /heartbeat (correlation_id=74651): {'id': 59875}
INFO 2019-02-05 16:15:50,738 __init__.py:82 - Event from server at /user/ (correlation_id=74651): {u'status': u'OK', u'id': 59876}
INFO 2019-02-05 16:16:00,739 security.py:135 - Event to server at /heartbeat (correlation_id=74652): {'id': 59876}
INFO 2019-02-05 16:16:00,740 __init__.py:82 - Event from server at /user/ (correlation_id=74652): {u'status': u'OK', u'id': 59877}
INFO 2019-02-05 16:16:08,628 security.py:135 - Event to server at /reports/alerts_status (correlation_id=74653): [{'name': u'datanode_storage', 'timestamp': 1549412167525L, 'clusterId': '2', 'definitionId': 19, 'state': 'OK', 'text': '...'}, {'name': u'datanode_heap_usage', 'timestamp': 1549412167519L, 'clusterId': '2', 'definitionId': 10, 'state': 'OK', 'text': '...'}]
INFO 2019-02-05 16:16:08,630 __init__.py:82 - Event from server at /user/ (correlation_id=74653): {u'status': u'OK'}
INFO 2019-02-05 16:16:10,743 security.py:135 - Event to server at /heartbeat (correlation_id=74654): {'id': 59877}
INFO 2019-02-05 16:16:10,744 __init__.py:82 - Event from server at /user/ (correlation_id=74654): {u'status': u'OK', u'id': 59878}
INFO 2019-02-05 16:16:20,745 security.py:135 - Event to server at /heartbeat (correlation_id=74655): {'id': 59878}
INFO 2019-02-05 16:16:20,746 __init__.py:82 - Event from server at /user/ (correlation_id=74655): {u'status': u'OK', u'id': 59879}
INFO 2019-02-05 16:16:30,747 security.py:135 - Event to server at /heartbeat (correlation_id=74656): {'id': 59879}
INFO 2019-02-05 16:16:30,748 __init__.py:82 - Event from server at /user/ (correlation_id=74656): {u'status': u'OK', u'id': 59880}
INFO 2019-02-05 16:16:40,751 security.py:135 - Event to server at /heartbeat (correlation_id=74657): {'id': 59880}
INFO 2019-02-05 16:16:40,752 __init__.py:82 - Event from server at /user/ (correlation_id=74657): {u'status': u'OK', u'id': 59881}
INFO 2019-02-05 16:16:42,184 Hardware.py:188 - Some mount points were ignored: /dev, /run, /dev/shm, /run/lock, /sys/fs/cgroup, /snap/core/6130, /run/user/1003, /snap/core/6259, /snap/core/6350, /run/user/1008, /run/user/1019, /run/user/1013, /run/user/1021, /run/user/1015, /run/user/1023, /run/user/0
INFO 2019-02-05 16:16:42,184 security.py:135 - Event to server at /reports/host_status (correlation_id=74658): {'agentEnv': {'transparentHugePage': 'madvise', 'hostHealth': {'agentTimeStampAtReporting': 1549412202175, 'liveServices': [{'status': 'Healthy', 'name': 'ntp or chrony', 'desc': ''}]}, 'reverseLookup': True, 'umask': '18', 'hasUnlimitedJcePolicy': True, 'alternatives': [], 'firewallName': 'ufw', 'stackFoldersAndFiles': [], 'existingUsers': [], 'firewallRunning': False}, 'mounts': [{'available': '5028752404', 'used': '28126256', 'percent': '1%', 'device': '/dev/sda2', 'mountpoint': '/', 'type': 'ext4', 'size': '5325330344'}]}
INFO 2019-02-05 16:16:42,186 __init__.py:82 - Event from server at /user/ (correlation_id=74658): {u'status': u'OK'}
INFO 2019-02-05 16:16:50,753 security.py:135 - Event to server at /heartbeat (correlation_id=74659): {'id': 59881}
INFO 2019-02-05 16:16:50,755 __init__.py:82 - Event from server at /user/ (correlation_id=74659): {u'status': u'OK', u'id': 59882}
INFO 2019-02-05 16:17:00,757 security.py:135 - Event to server at /heartbeat (correlation_id=74660): {'id': 59882}
INFO 2019-02-05 16:17:00,758 __init__.py:82 - Event from server at /user/ (correlation_id=74660): {u'status': u'OK', u'id': 59883}
INFO 2019-02-05 16:17:10,759 security.py:135 - Event to server at /heartbeat (correlation_id=74661): {'id': 59883}
INFO 2019-02-05 16:17:10,761 __init__.py:82 - Event from server at /user/ (correlation_id=74661): {u'status': u'OK', u'id': 59884}
INFO 2019-02-05 16:17:20,763 security.py:135 - Event to server at /heartbeat (correlation_id=74662): {'id': 59884}
INFO 2019-02-05 16:17:20,765 __init__.py:82 - Event from server at /user/ (correlation_id=74662): {u'status': u'OK', u'id': 59885}
INFO 2019-02-05 16:17:30,767 security.py:135 - Event to server at /heartbeat (correlation_id=74663): {'id': 59885}
INFO 2019-02-05 16:17:30,769 __init__.py:82 - Event from server at /user/ (correlation_id=74663): {u'status': u'OK', u'id': 59886}
INFO 2019-02-05 16:17:40,769 security.py:135 - Event to server at /heartbeat (correlation_id=74664): {'id': 59886}
INFO 2019-02-05 16:17:40,771 __init__.py:82 - Event from server at /user/ (correlation_id=74664): {u'status': u'OK', u'id': 59887}
INFO 2019-02-05 16:17:42,427 Hardware.py:188 - Some mount points were ignored: /dev, /run, /dev/shm, /run/lock, /sys/fs/cgroup, /snap/core/6130, /run/user/1003, /snap/core/6259, /snap/core/6350, /run/user/1008, /run/user/1019, /run/user/1013, /run/user/1021, /run/user/1015, /run/user/1023, /run/user/0
INFO 2019-02-05 16:17:42,427 security.py:135 - Event to server at /reports/host_status (correlation_id=74665): {'agentEnv': {'transparentHugePage': 'madvise', 'hostHealth': {'agentTimeStampAtReporting': 1549412262418, 'liveServices': [{'status': 'Healthy', 'name': 'ntp or chrony', 'desc': ''}]}, 'reverseLookup': True, 'umask': '18', 'hasUnlimitedJcePolicy': True, 'alternatives': [], 'firewallName': 'ufw', 'stackFoldersAndFiles': [], 'existingUsers': [], 'firewallRunning': False}, 'mounts': [{'available': '5028752380', 'used': '28126280', 'percent': '1%', 'device': '/dev/sda2', 'mountpoint': '/', 'type': 'ext4', 'size': '5325330344'}]}
INFO 2019-02-05 16:17:42,429 __init__.py:82 - Event from server at /user/ (correlation_id=74665): {u'status': u'OK'}
INFO 2019-02-05 16:17:50,774 security.py:135 - Event to server at /heartbeat (correlation_id=74666): {'id': 59887}
INFO 2019-02-05 16:17:50,775 __init__.py:82 - Event from server at /user/ (correlation_id=74666): {u'status': u'OK', u'id': 59888}
INFO 2019-02-05 16:18:00,778 security.py:135 - Event to server at /heartbeat (correlation_id=74667): {'id': 59888}
INFO 2019-02-05 16:18:00,781 __init__.py:82 - Event from server at /user/ (correlation_id=74667): {u'status': u'OK', u'id': 59889}
INFO 2019-02-05 16:18:08,633 security.py:135 - Event to server at /reports/alerts_status (correlation_id=74668): [{'name': u'datanode_heap_usage', 'timestamp': 1549412287525L, 'clusterId': '2', 'definitionId': 10, 'state': 'OK', 'text': '...'}, {'name': u'datanode_storage', 'timestamp': 1549412287526L, 'clusterId': '2', 'definitionId': 19, 'state': 'OK', 'text': '...'}]
INFO 2019-02-05 16:18:08,635 __init__.py:82 - Event from server at /user/ (correlation_id=74668): {u'status': u'OK'}
INFO 2019-02-05 16:18:10,782 security.py:135 - Event to server at /heartbeat (correlation_id=74669): {'id': 59889}
INFO 2019-02-05 16:18:10,783 __init__.py:82 - Event from server at /user/ (correlation_id=74669): {u'status': u'OK', u'id': 59890}
INFO 2019-02-05 16:18:20,784 security.py:135 - Event to server at /heartbeat (correlation_id=74670): {'id': 59890}
INFO 2019-02-05 16:18:20,785 __init__.py:82 - Event from server at /user/ (correlation_id=74670): {u'status': u'OK', u'id': 59891}
INFO 2019-02-05 16:18:30,788 security.py:135 - Event to server at /heartbeat (correlation_id=74671): {'id': 59891}
INFO 2019-02-05 16:18:30,790 __init__.py:82 - Event from server at /user/ (correlation_id=74671): {u'status': u'OK', u'id': 59892}

Super Mentor

@Tom Burke

I think everything is fine and i see no errors any more in the UI operational logs or in the ambari-agent logs.

So it looks good to me.

Do you still see any issue?

Explorer

Actually I cannot find the ambari agent on the ambari server host. the log I showed is from another member server,

So just to backup a se. when I try to enable the kerberos... I get the hostname fail as the first error , this happens on the ambari server it says.... this is what led me to the conclusion that I need to add the ambari server tot he cluster which is why I opened that other question. maybe this pic will helphorton-err-obfu-1.pdf

Explorer

I sort of expected to see the ambari server listed in Hosts....no-ambari-in-hosts.png

incorect assumption? anyhow I suppose does nto matter since you said is not needed to enable kerberos. we can put this to bed if you like ,thanks!

Super Mentor

@Tom Burke

Looks like the FQDN is not set correctly on your failing Node.

Please run the following commands to verify if the FQDN is setup correctly? (hostname and FQDN are not same)

# python<<<"import socket;print socket.getfqdn();"
(OR)
# hostname -f
# hostname

If you find a difference in the FQDN then please set the FQDN of your host correctly.

You can find the details here about hostname and public_hostname: https://community.hortonworks.com/content/kbentry/42872/why-ambari-host-might-have-different-public-...

Super Mentor

@Tom Burke And regarding the SSL related issue.

You will need to make sure that you configure a truststore in ambari Server and import the LDAP/AD certificate to Ambari Server's truststore to fix the following message:

Failed to connect to KDC - Failed to communicate with the Active Directory at ldaps://xxx.yyy.com:636: simple bind failed: xxx.yyy.com:636

.

Please see: https://community.hortonworks.com/content/supportkb/148572/failed-to-connect-to-kdc-make-sure-the-se...

Explorer

Hi Jay ,, sorry, but all 3 hostname command return same

ambari.example.com

Explorer

yes I did the import to trust store fine. just does not work.

Super Mentor

@Tom Burke

You can also open the same URL from the Browser where you have logged in the Ambari UI as well.

http://ambari.example.com:8443/api/v1/clusters/cluster-name/hosts?fields=Hosts/ip,Hosts/host_name

.

(OR) You can use the following curl call to produce the JSON output to some file like "/tmp/hosts.json". Also pleas emake sure to use the correct cluster name in the same URL.

# curl -k -H "X-Requested-By: ambari" -u admin:admin "http://ambari.example.com:8443/api/v1/clusters/cluster-name/hosts?fields=Hosts/ip,Hosts/host_name" -o "/tmp/hosts.json"

.

Explorer

Hi Jay, thansk for help , uncertain what asked last , want this output again? attached.hosts-2.json

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.