Created on 02-05-2019 10:28 PM - edited 09-16-2022 07:07 AM
step 1 get rsa key from umbari host
# cat /root/.ssh/id_rsa
step 2 Amabri web UI choose Hosts...Action Add Host. enter ambari.example.com in Target Hosts Field
Paste the key from step 1 into where it asks for 'ssh prv key'
step 3 click button Register and confirm....
returns Failed
========================== Creating target directory... ========================== Command start time 2019-02-05 14:23:08 root@ambari.example.com: Permission denied (publickey,password). SSH command execution finished host=ambari.example.com, exitcode=255 Command end time 2019-02-05 14:23:08 ERROR: Bootstrap of host ambari.example.com fails because previous action finished with non-zero exit code (255) ERROR MESSAGE: root@ambari.example.com: Permission denied (publickey,password). STDOUT: root@ambari.example.com: Permission denied (publickey,password).OK
Created 02-06-2019 02:59 AM
Are you really checking what kind of command aere you executing in curl and what does every curl argument means?
Before attaching the "hosts4.json" if you would have just checked the content of this file then you would know that the credentials which you are entering in curl command are Wrong ambari admin credential.
# cat hosts4.json { "status": 403, "message": "Unable to sign in. Invalid username/password combination." }
.
As i do not know your ambari admin credentials hence i just gave you a dummy curl command and expected that you will change the values according to your cluster.
Created 02-05-2019 10:34 PM
1. Please make sure that the permission on these files are correct as following:
# ls -lart /root/.ssh/ total 20 -rw-r--r--. 1 root root 409 Jul 21 2018 id_rsa.pub -rw-------. 1 root root 1679 Jul 21 2018 id_rsa
.
2. Also please make sure that the id_rsa files are generated on the host after fixing the hostname. (if the host was having multiple hostname earlier then please regenerate the keys again)
Also make sure to copy this public key to the mentioned hosts (and all cluster hosts)
# ssh-copy-id -i ~/.ssh/id_rsa.pub root@ambari.example.com
.
3. Also for a quick check if the passwordless SSH is setup already then you should be able to do SSH without entering password nexttime.
# ssh root@ambari.example.com
Created 02-05-2019 11:17 PM
Hi Jay
I was missing the ssh-copy-id -i ~/.ssh/id_rsa.pub root@ambari.example.com
I did the above and tested OK passwd-less root login to the ambari box.
Returned Failed but now with... ( thanks again!)
========================== Creating target directory... ========================== Command start time 2019-02-05 15:14:41 chmod: cannot access '/var/lib/ambari-agent/data': No such file or directory Connection to ambari.example.com closed. SSH command execution finished host=ambari.example.com, exitcode=0 Command end time 2019-02-05 15:14:42 ========================== Copying ambari sudo script... ========================== Command start time 2019-02-05 15:14:42 scp /var/lib/ambari-server/ambari-sudo.sh host=ambari.example.com, exitcode=0 Command end time 2019-02-05 15:14:42 ========================== Copying common functions script... ========================== Command start time 2019-02-05 15:14:42 scp /usr/lib/ambari-server/lib/ambari_commons host=ambari.example.com, exitcode=0 Command end time 2019-02-05 15:14:43 ========================== Copying create-python-wrap script... ========================== Command start time 2019-02-05 15:14:43 scp /var/lib/ambari-server/create-python-wrap.sh host=ambari.example.com, exitcode=0 Command end time 2019-02-05 15:14:43 ========================== Copying OS type check script... ========================== Command start time 2019-02-05 15:14:43 scp /usr/lib/ambari-server/lib/ambari_server/os_check_type.py host=ambari.example.com, exitcode=0 Command end time 2019-02-05 15:14:44 ========================== Running create-python-wrap script... ========================== Command start time 2019-02-05 15:14:44 Connection to ambari.example.com closed. SSH command execution finished host=ambari.example.com, exitcode=0 Command end time 2019-02-05 15:14:45 ========================== Running OS type check... ========================== Command start time 2019-02-05 15:14:45 Cluster primary/cluster OS family is ubuntu18 and local/current OS family is ubuntu18 Connection to ambari.example.com closed. SSH command execution finished host=ambari.example.com, exitcode=0 Command end time 2019-02-05 15:14:45 ========================== Checking 'sudo' package on remote host... ========================== Command start time 2019-02-05 15:14:45 Connection to ambari.example.com closed. SSH command execution finished host=ambari.example.com, exitcode=0 Command end time 2019-02-05 15:14:46 ========================== Copying repo file to 'tmp' folder... ========================== Command start time 2019-02-05 15:14:46 scp /etc/apt/sources.list.d/ambari.list host=ambari.example.com, exitcode=0 Command end time 2019-02-05 15:14:46 ========================== Moving file to repo dir... ========================== Command start time 2019-02-05 15:14:46 Connection to ambari.example.com closed. SSH command execution finished host=ambari.example.com, exitcode=0 Command end time 2019-02-05 15:14:47 ========================== Changing permissions for ambari.repo... ========================== Command start time 2019-02-05 15:14:47 Connection to ambari.example.com closed. SSH command execution finished host=ambari.example.com, exitcode=0 Command end time 2019-02-05 15:14:47 ========================== Update apt cache of repository... ========================== Command start time 2019-02-05 15:14:47 0% [Working] 0% [Working] Get:1 http://public-repo-1.hortonworks.com/ambari/ubuntu14/2.x/updates/2.2.1.0 Ambari InRelease [3,187 B] 0% [1 InRelease 3,187 B/3,187 B 100%]
Created 02-05-2019 11:18 PM
This time i do not see it is failing with the the previous error.
But it is strange to see that you are planning to use too old Ambari 2.2.1.0 as i see your output as following:
Get:1 http://public-repo-1.hortonworks.com/ambari/ubuntu14/2.x/updates/2.2.1.0 Ambari InRelease [3,187 B]
.
Is there any specific reason for choosing so old Ambari version?
Latest ambari release is 2.7.3
.
Created 02-05-2019 11:53 PM
Hi Jay - No particular reason. I inherited the cluster install that was made from a recent download l( like 3 weeks ago from your site ) so I assumed pretty new.
Ambari machine still does not show in Hosts?
Created 02-05-2019 11:53 PM
Could our problem be this old version?
Created 02-05-2019 11:57 PM
Can you also please check on the agent host to know why it is referring to Old ambari version 2.2.1.0?
May be on the problematic host you can try running this command to find out which version is it using?
# dpkg --list | grep ambari
.
Created 02-06-2019 12:03 AM
ambari-server 2.7.3.0-139 amd64 Ambari Server
Created 02-06-2019 12:06 AM
i meant to say on the host where the Agent registration is failing can you check from where Ambari 2.2.1.0 binaries are coming because we saw in your output as :
http://public-repo-1.hortonworks.com/ambari/ubuntu14/2.x/updates/2.2.1.0 Ambari
.
Or can you please share the "ambari-agent.log" from the host where the agent setup is failing?
Created 02-06-2019 12:16 AM
The host I cannot join to cluster is the Ambari server itself , I thought this was needed fro my reading of docs for the cluster in order to run the enable kerberos wizard. But seems not needed from your previous comment. I can run the command regardless , here are returns form one member compute1
root@compute1:~# dpkg --list | grep ambari
ii ambari-agent 2.7.3.0-139 amd64 Ambari Agent
ii ambari-infra-solr-client 2.7.3.0-139 amd64 [[description]]
ii ambari-metrics-assembly 2.7.3.0-139 amd64 Ambari Metrics Assembly
Created 02-06-2019 12:18 AM
Looks like ambari agent is already installed on your ambari server host. So just try to start it and then see it it is starting fine without any error.
# ambari-agent start; tail -f /var/log/ambari-agent/ambari-agent.log
.
Created 02-06-2019 12:37 AM
Hello Jay log file from compute1
INFO 2019-02-05 16:14:41,688 security.py:135 - Event to server at /reports/host_status (correlation_id=74643): {'agentEnv': {'transparentHugePage': 'madvise', 'hostHealth': {'agentTimeStampAtReporting': 1549412081679, 'liveServices': [{'status': 'Healthy', 'name': 'ntp or chrony', 'desc': ''}]}, 'reverseLookup': True, 'umask': '18', 'hasUnlimitedJcePolicy': True, 'alternatives': [], 'firewallName': 'ufw', 'stackFoldersAndFiles': [], 'existingUsers': [], 'firewallRunning': False}, 'mounts': [{'available': '5028752436', 'used': '28126224', 'percent': '1%', 'device': '/dev/sda2', 'mountpoint': '/', 'type': 'ext4', 'size': '5325330344'}]} INFO 2019-02-05 16:14:41,690 __init__.py:82 - Event from server at /user/ (correlation_id=74643): {u'status': u'OK'} INFO 2019-02-05 16:14:50,711 security.py:135 - Event to server at /heartbeat (correlation_id=74644): {'id': 59869} INFO 2019-02-05 16:14:50,713 __init__.py:82 - Event from server at /user/ (correlation_id=74644): {u'status': u'OK', u'id': 59870} INFO 2019-02-05 16:15:00,715 security.py:135 - Event to server at /heartbeat (correlation_id=74645): {'id': 59870} INFO 2019-02-05 16:15:00,718 __init__.py:82 - Event from server at /user/ (correlation_id=74645): {u'status': u'OK', u'id': 59871} INFO 2019-02-05 16:15:10,719 security.py:135 - Event to server at /heartbeat (correlation_id=74646): {'id': 59871} INFO 2019-02-05 16:15:10,724 __init__.py:82 - Event from server at /user/ (correlation_id=74646): {u'status': u'OK', u'id': 59872} INFO 2019-02-05 16:15:20,729 security.py:135 - Event to server at /heartbeat (correlation_id=74647): {'id': 59872} INFO 2019-02-05 16:15:20,731 __init__.py:82 - Event from server at /user/ (correlation_id=74647): {u'status': u'OK', u'id': 59873} INFO 2019-02-05 16:15:30,733 security.py:135 - Event to server at /heartbeat (correlation_id=74648): {'id': 59873} INFO 2019-02-05 16:15:30,734 __init__.py:82 - Event from server at /user/ (correlation_id=74648): {u'status': u'OK', u'id': 59874} INFO 2019-02-05 16:15:40,735 security.py:135 - Event to server at /heartbeat (correlation_id=74649): {'id': 59874} INFO 2019-02-05 16:15:40,736 __init__.py:82 - Event from server at /user/ (correlation_id=74649): {u'status': u'OK', u'id': 59875} INFO 2019-02-05 16:15:41,938 Hardware.py:188 - Some mount points were ignored: /dev, /run, /dev/shm, /run/lock, /sys/fs/cgroup, /snap/core/6130, /run/user/1003, /snap/core/6259, /snap/core/6350, /run/user/1008, /run/user/1019, /run/user/1013, /run/user/1021, /run/user/1015, /run/user/1023, /run/user/0 INFO 2019-02-05 16:15:41,938 security.py:135 - Event to server at /reports/host_status (correlation_id=74650): {'agentEnv': {'transparentHugePage': 'madvise', 'hostHealth': {'agentTimeStampAtReporting': 1549412141929, 'liveServices': [{'status': 'Healthy', 'name': 'ntp or chrony', 'desc': ''}]}, 'reverseLookup': True, 'umask': '18', 'hasUnlimitedJcePolicy': True, 'alternatives': [], 'firewallName': 'ufw', 'stackFoldersAndFiles': [], 'existingUsers': [], 'firewallRunning': False}, 'mounts': [{'available': '5028752424', 'used': '28126236', 'percent': '1%', 'device': '/dev/sda2', 'mountpoint': '/', 'type': 'ext4', 'size': '5325330344'}]} INFO 2019-02-05 16:15:41,940 __init__.py:82 - Event from server at /user/ (correlation_id=74650): {u'status': u'OK'} INFO 2019-02-05 16:15:50,737 security.py:135 - Event to server at /heartbeat (correlation_id=74651): {'id': 59875} INFO 2019-02-05 16:15:50,738 __init__.py:82 - Event from server at /user/ (correlation_id=74651): {u'status': u'OK', u'id': 59876} INFO 2019-02-05 16:16:00,739 security.py:135 - Event to server at /heartbeat (correlation_id=74652): {'id': 59876} INFO 2019-02-05 16:16:00,740 __init__.py:82 - Event from server at /user/ (correlation_id=74652): {u'status': u'OK', u'id': 59877} INFO 2019-02-05 16:16:08,628 security.py:135 - Event to server at /reports/alerts_status (correlation_id=74653): [{'name': u'datanode_storage', 'timestamp': 1549412167525L, 'clusterId': '2', 'definitionId': 19, 'state': 'OK', 'text': '...'}, {'name': u'datanode_heap_usage', 'timestamp': 1549412167519L, 'clusterId': '2', 'definitionId': 10, 'state': 'OK', 'text': '...'}] INFO 2019-02-05 16:16:08,630 __init__.py:82 - Event from server at /user/ (correlation_id=74653): {u'status': u'OK'} INFO 2019-02-05 16:16:10,743 security.py:135 - Event to server at /heartbeat (correlation_id=74654): {'id': 59877} INFO 2019-02-05 16:16:10,744 __init__.py:82 - Event from server at /user/ (correlation_id=74654): {u'status': u'OK', u'id': 59878} INFO 2019-02-05 16:16:20,745 security.py:135 - Event to server at /heartbeat (correlation_id=74655): {'id': 59878} INFO 2019-02-05 16:16:20,746 __init__.py:82 - Event from server at /user/ (correlation_id=74655): {u'status': u'OK', u'id': 59879} INFO 2019-02-05 16:16:30,747 security.py:135 - Event to server at /heartbeat (correlation_id=74656): {'id': 59879} INFO 2019-02-05 16:16:30,748 __init__.py:82 - Event from server at /user/ (correlation_id=74656): {u'status': u'OK', u'id': 59880} INFO 2019-02-05 16:16:40,751 security.py:135 - Event to server at /heartbeat (correlation_id=74657): {'id': 59880} INFO 2019-02-05 16:16:40,752 __init__.py:82 - Event from server at /user/ (correlation_id=74657): {u'status': u'OK', u'id': 59881} INFO 2019-02-05 16:16:42,184 Hardware.py:188 - Some mount points were ignored: /dev, /run, /dev/shm, /run/lock, /sys/fs/cgroup, /snap/core/6130, /run/user/1003, /snap/core/6259, /snap/core/6350, /run/user/1008, /run/user/1019, /run/user/1013, /run/user/1021, /run/user/1015, /run/user/1023, /run/user/0 INFO 2019-02-05 16:16:42,184 security.py:135 - Event to server at /reports/host_status (correlation_id=74658): {'agentEnv': {'transparentHugePage': 'madvise', 'hostHealth': {'agentTimeStampAtReporting': 1549412202175, 'liveServices': [{'status': 'Healthy', 'name': 'ntp or chrony', 'desc': ''}]}, 'reverseLookup': True, 'umask': '18', 'hasUnlimitedJcePolicy': True, 'alternatives': [], 'firewallName': 'ufw', 'stackFoldersAndFiles': [], 'existingUsers': [], 'firewallRunning': False}, 'mounts': [{'available': '5028752404', 'used': '28126256', 'percent': '1%', 'device': '/dev/sda2', 'mountpoint': '/', 'type': 'ext4', 'size': '5325330344'}]} INFO 2019-02-05 16:16:42,186 __init__.py:82 - Event from server at /user/ (correlation_id=74658): {u'status': u'OK'} INFO 2019-02-05 16:16:50,753 security.py:135 - Event to server at /heartbeat (correlation_id=74659): {'id': 59881} INFO 2019-02-05 16:16:50,755 __init__.py:82 - Event from server at /user/ (correlation_id=74659): {u'status': u'OK', u'id': 59882} INFO 2019-02-05 16:17:00,757 security.py:135 - Event to server at /heartbeat (correlation_id=74660): {'id': 59882} INFO 2019-02-05 16:17:00,758 __init__.py:82 - Event from server at /user/ (correlation_id=74660): {u'status': u'OK', u'id': 59883} INFO 2019-02-05 16:17:10,759 security.py:135 - Event to server at /heartbeat (correlation_id=74661): {'id': 59883} INFO 2019-02-05 16:17:10,761 __init__.py:82 - Event from server at /user/ (correlation_id=74661): {u'status': u'OK', u'id': 59884} INFO 2019-02-05 16:17:20,763 security.py:135 - Event to server at /heartbeat (correlation_id=74662): {'id': 59884} INFO 2019-02-05 16:17:20,765 __init__.py:82 - Event from server at /user/ (correlation_id=74662): {u'status': u'OK', u'id': 59885} INFO 2019-02-05 16:17:30,767 security.py:135 - Event to server at /heartbeat (correlation_id=74663): {'id': 59885} INFO 2019-02-05 16:17:30,769 __init__.py:82 - Event from server at /user/ (correlation_id=74663): {u'status': u'OK', u'id': 59886} INFO 2019-02-05 16:17:40,769 security.py:135 - Event to server at /heartbeat (correlation_id=74664): {'id': 59886} INFO 2019-02-05 16:17:40,771 __init__.py:82 - Event from server at /user/ (correlation_id=74664): {u'status': u'OK', u'id': 59887} INFO 2019-02-05 16:17:42,427 Hardware.py:188 - Some mount points were ignored: /dev, /run, /dev/shm, /run/lock, /sys/fs/cgroup, /snap/core/6130, /run/user/1003, /snap/core/6259, /snap/core/6350, /run/user/1008, /run/user/1019, /run/user/1013, /run/user/1021, /run/user/1015, /run/user/1023, /run/user/0 INFO 2019-02-05 16:17:42,427 security.py:135 - Event to server at /reports/host_status (correlation_id=74665): {'agentEnv': {'transparentHugePage': 'madvise', 'hostHealth': {'agentTimeStampAtReporting': 1549412262418, 'liveServices': [{'status': 'Healthy', 'name': 'ntp or chrony', 'desc': ''}]}, 'reverseLookup': True, 'umask': '18', 'hasUnlimitedJcePolicy': True, 'alternatives': [], 'firewallName': 'ufw', 'stackFoldersAndFiles': [], 'existingUsers': [], 'firewallRunning': False}, 'mounts': [{'available': '5028752380', 'used': '28126280', 'percent': '1%', 'device': '/dev/sda2', 'mountpoint': '/', 'type': 'ext4', 'size': '5325330344'}]} INFO 2019-02-05 16:17:42,429 __init__.py:82 - Event from server at /user/ (correlation_id=74665): {u'status': u'OK'} INFO 2019-02-05 16:17:50,774 security.py:135 - Event to server at /heartbeat (correlation_id=74666): {'id': 59887} INFO 2019-02-05 16:17:50,775 __init__.py:82 - Event from server at /user/ (correlation_id=74666): {u'status': u'OK', u'id': 59888} INFO 2019-02-05 16:18:00,778 security.py:135 - Event to server at /heartbeat (correlation_id=74667): {'id': 59888} INFO 2019-02-05 16:18:00,781 __init__.py:82 - Event from server at /user/ (correlation_id=74667): {u'status': u'OK', u'id': 59889} INFO 2019-02-05 16:18:08,633 security.py:135 - Event to server at /reports/alerts_status (correlation_id=74668): [{'name': u'datanode_heap_usage', 'timestamp': 1549412287525L, 'clusterId': '2', 'definitionId': 10, 'state': 'OK', 'text': '...'}, {'name': u'datanode_storage', 'timestamp': 1549412287526L, 'clusterId': '2', 'definitionId': 19, 'state': 'OK', 'text': '...'}] INFO 2019-02-05 16:18:08,635 __init__.py:82 - Event from server at /user/ (correlation_id=74668): {u'status': u'OK'} INFO 2019-02-05 16:18:10,782 security.py:135 - Event to server at /heartbeat (correlation_id=74669): {'id': 59889} INFO 2019-02-05 16:18:10,783 __init__.py:82 - Event from server at /user/ (correlation_id=74669): {u'status': u'OK', u'id': 59890} INFO 2019-02-05 16:18:20,784 security.py:135 - Event to server at /heartbeat (correlation_id=74670): {'id': 59890} INFO 2019-02-05 16:18:20,785 __init__.py:82 - Event from server at /user/ (correlation_id=74670): {u'status': u'OK', u'id': 59891} INFO 2019-02-05 16:18:30,788 security.py:135 - Event to server at /heartbeat (correlation_id=74671): {'id': 59891} INFO 2019-02-05 16:18:30,790 __init__.py:82 - Event from server at /user/ (correlation_id=74671): {u'status': u'OK', u'id': 59892}
Created 02-06-2019 12:40 AM
I think everything is fine and i see no errors any more in the UI operational logs or in the ambari-agent logs.
So it looks good to me.
Do you still see any issue?
Created 02-06-2019 12:41 AM
Actually I cannot find the ambari agent on the ambari server host. the log I showed is from another member server,
So just to backup a se. when I try to enable the kerberos... I get the hostname fail as the first error , this happens on the ambari server it says.... this is what led me to the conclusion that I need to add the ambari server tot he cluster which is why I opened that other question. maybe this pic will helphorton-err-obfu-1.pdf
Created 02-06-2019 12:46 AM
I sort of expected to see the ambari server listed in Hosts....no-ambari-in-hosts.png
incorect assumption? anyhow I suppose does nto matter since you said is not needed to enable kerberos. we can put this to bed if you like ,thanks!
Created 02-06-2019 12:46 AM
Looks like the FQDN is not set correctly on your failing Node.
Please run the following commands to verify if the FQDN is setup correctly? (hostname and FQDN are not same)
# python<<<"import socket;print socket.getfqdn();" (OR) # hostname -f # hostname
If you find a difference in the FQDN then please set the FQDN of your host correctly.
You can find the details here about hostname and public_hostname: https://community.hortonworks.com/content/kbentry/42872/why-ambari-host-might-have-different-public-...
Created 02-06-2019 12:50 AM
@Tom Burke And regarding the SSL related issue.
You will need to make sure that you configure a truststore in ambari Server and import the LDAP/AD certificate to Ambari Server's truststore to fix the following message:
Failed to connect to KDC - Failed to communicate with the Active Directory at ldaps://xxx.yyy.com:636: simple bind failed: xxx.yyy.com:636
.
Please see: https://community.hortonworks.com/content/supportkb/148572/failed-to-connect-to-kdc-make-sure-the-se...
Created 02-06-2019 12:50 AM
Hi Jay ,, sorry, but all 3 hostname command return same
ambari.example.com
Created 02-06-2019 12:52 AM
yes I did the import to trust store fine. just does not work.
Created 02-06-2019 12:54 AM
You can also open the same URL from the Browser where you have logged in the Ambari UI as well.
http://ambari.example.com:8443/api/v1/clusters/cluster-name/hosts?fields=Hosts/ip,Hosts/host_name
.
(OR) You can use the following curl call to produce the JSON output to some file like "/tmp/hosts.json". Also pleas emake sure to use the correct cluster name in the same URL.
# curl -k -H "X-Requested-By: ambari" -u admin:admin "http://ambari.example.com:8443/api/v1/clusters/cluster-name/hosts?fields=Hosts/ip,Hosts/host_name" -o "/tmp/hosts.json"
.
Created 02-06-2019 01:03 AM
Hi Jay, thansk for help , uncertain what asked last , want this output again? attached.hosts-2.json