Created 04-08-2018 11:06 PM
Hello,
I am trying to setup a multi-node cluster using Ambari. I already did the steps described here, including the setup of passwordless ssh between the nodes. When I run `ssh mymaster1`, `ssh myslave1` or `ssh myslave2` from the master node, I can connect successfully to a relevant node. But I have not configured the opposite ssh connection, i.e. from slaves to a master (I hope that it is not required). In other words, when I run, for example, `ssh mymaster1` from a slave, the connection is not permitted (I hope that it's ok, but let me know if it should be fixed).
So, I get the following error at the step "Confirm Hosts", when I install a cluster using Ambari UI:
========================== Creating target directory... ========================== Command start time 2018-04-08 14:53:45 Permission denied (publickey,gssapi-keyex,gssapi-with-mic). SSH command execution finished host=eureambarimaster1, exitcode=255 Command end time 2018-04-08 14:53:45 ERROR: Bootstrap of host eureambarimaster1 fails because previous action finished with non-zero exit code (255) ERROR MESSAGE: Permission denied (publickey,gssapi-keyex,gssapi-with-mic). STDOUT: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
It should be noticed that I do not run `ambari-agent start`. I only execute `ambari-server start`.
Thanks.
Created 04-09-2018 09:49 AM
Your recent error is due to :
ERROR 2018-04-09 09:29:45,637 NetUtil.py:88 - [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:579)
So please try the following, If the file "/etc/python/cert-verification.cfg" does not exist then please create one.
# sed -i 's/verify=platform_default/verify=disable/' /etc/python/cert-verification.cfg
.
Created 04-08-2018 11:14 PM
When you setup Passwordless SSH then you will need to make sure that the passwordless ssh is set for the correct user.
For example if you are running Ambari Server and Agents both as "root" user then you should setup the passwordless ssh for that same user.
I will suggest you to please try this again:
1. Generate SSH keys on Ambari Server Host (Master). (While generating the keys please keep the "passphrase" empty)
# ssh-keygen Enter file in which to save the key (/root/.ssh/id_rsa): Enter passphrase (empty for no passphrase): Enter same passphrase again:
2. Now from ambari server host run the following command to setup the passwordless SSH from Master to all Slave hosts.
# ssh-copy-id -i ~/.ssh/id_rsa root@slave1.example.com # ssh-copy-id -i ~/.ssh/id_rsa root@slave2.example.com
3. Now you should be able to test the passwordless SSH from ambari server to ambari agents. (not need to setup passwordless SSH from agent to host)
4. Now when in the ambari UI it asks for the Private key info then please use the content of file
# cat /root/.ssh/id_rsa
5. The permission on the files should be something like this: (On Master Ambari Server)
# ls -l /root/.ssh/id_rsa* -rw-------. 1 root root 1679 Mar 13 08:39 /root/.ssh/id_rsa -rw-r--r--. 1 root root 407 Mar 13 08:39 /root/.ssh/id_rsa.pub # ls -ld /root/.ssh drwx------. 2 root root 58 Mar 15 23:27 /root/.ssh
6. The permission on the files should be something like this: (On All the Slave Agent Hosts)
# ls -l /root/.ssh/ -rw-------. 1 root root 819 Jun 9 2017 authorized_keys # ls -ld /root/.ssh drwx------. 2 root root 28 Jun 9 2017 /root/.ssh
.
The problem can be either due to the following facts:
1. If the "~/.ssh" and it's contents are not set correctly as mentioned above.
2. If the FQDN of every Host is not setup correctly. Please check the output of the following command on every host to see if the FQDN is setup correctly? https://docs.hortonworks.com/HDPDocuments/Ambari-2.6.1.5/bk_ambari-installation-ppc/content/edit_the...
# hostname -f
3. If the passwordless SSH is setup for the correct user. SELinux is disabled and Firewall is off on all nodes including ambari server host.
.
Created 04-09-2018 09:49 AM
Hello Jay,
Thanks. I re-configured everything for the user "centos". Now I was able to execute more steps in Ambari UI (see the logs below). However, at the end I get the following error "Server at https://eureambarimaster1.local.eurecat.org:8440 is not reachable"
========================== Creating target directory... ========================== Command start time 2018-04-09 09:29:40 Connection to eureambarimaster1.local.eurecat.org closed. SSH command execution finished host=eureambarimaster1.local.eurecat.org, exitcode=0 Command end time 2018-04-09 09:29:40 ========================== Copying ambari sudo script... ========================== Command start time 2018-04-09 09:29:40 scp /var/lib/ambari-server/ambari-sudo.sh host=eureambarimaster1.local.eurecat.org, exitcode=0 Command end time 2018-04-09 09:29:40 ========================== Copying common functions script... ========================== Command start time 2018-04-09 09:29:40 scp /usr/lib/python2.6/site-packages/ambari_commons host=eureambarimaster1.local.eurecat.org, exitcode=0 Command end time 2018-04-09 09:29:41 ========================== Copying OS type check script... ========================== Command start time 2018-04-09 09:29:41 scp /usr/lib/python2.6/site-packages/ambari_server/os_check_type.py host=eureambarimaster1.local.eurecat.org, exitcode=0 Command end time 2018-04-09 09:29:41 ========================== Running OS type check... ========================== Command start time 2018-04-09 09:29:41 Cluster primary/cluster OS family is redhat7 and local/current OS family is redhat7 Connection to eureambarimaster1.local.eurecat.org closed. SSH command execution finished host=eureambarimaster1.local.eurecat.org, exitcode=0 Command end time 2018-04-09 09:29:41 ========================== Checking 'sudo' package on remote host... ========================== Command start time 2018-04-09 09:29:41 sudo-1.8.19p2-11.el7_4.x86_64 Connection to eureambarimaster1.local.eurecat.org closed. SSH command execution finished host=eureambarimaster1.local.eurecat.org, exitcode=0 Command end time 2018-04-09 09:29:42 ========================== Copying repo file to 'tmp' folder... ========================== Command start time 2018-04-09 09:29:42 scp /etc/yum.repos.d/ambari.repo host=eureambarimaster1.local.eurecat.org, exitcode=0 Command end time 2018-04-09 09:29:43 ========================== Moving file to repo dir... ========================== Command start time 2018-04-09 09:29:43 Connection to eureambarimaster1.local.eurecat.org closed. SSH command execution finished host=eureambarimaster1.local.eurecat.org, exitcode=0 Command end time 2018-04-09 09:29:43 ========================== Changing permissions for ambari.repo... ========================== Command start time 2018-04-09 09:29:43 Connection to eureambarimaster1.local.eurecat.org closed. SSH command execution finished host=eureambarimaster1.local.eurecat.org, exitcode=0 Command end time 2018-04-09 09:29:43 ========================== Copying setup script file... ========================== Command start time 2018-04-09 09:29:43 scp /usr/lib/python2.6/site-packages/ambari_server/setupAgent.py host=eureambarimaster1.local.eurecat.org, exitcode=0 Command end time 2018-04-09 09:29:43 ========================== Running setup agent script... ========================== Command start time 2018-04-09 09:29:43 Failed to set locale, defaulting to C ('WARNING 2018-04-09 09:22:38,867 NetUtil.py:116 - Server at https://eureambarimaster1.local.eurecat.org:8440 is not reachable, sleeping for 10 seconds... INFO 2018-04-09 09:22:38,868 HeartbeatHandlers.py:115 - Stop event received INFO 2018-04-09 09:22:38,868 NetUtil.py:122 - Stop event received INFO 2018-04-09 09:22:38,868 ExitHelper.py:53 - Performing cleanup before exiting... INFO 2018-04-09 09:22:38,868 ExitHelper.py:67 - Cleanup finished, exiting with code:0 INFO 2018-04-09 09:22:40,140 main.py:223 - Agent died gracefully, exiting. INFO 2018-04-09 09:22:40,140 ExitHelper.py:53 - Performing cleanup before exiting... INFO 2018-04-09 09:29:45,567 main.py:90 - loglevel=logging.INFO INFO 2018-04-09 09:29:45,567 main.py:90 - loglevel=logging.INFO INFO 2018-04-09 09:29:45,567 main.py:90 - loglevel=logging.INFO INFO 2018-04-09 09:29:45,569 DataCleaner.py:39 - Data cleanup thread started INFO 2018-04-09 09:29:45,570 DataCleaner.py:120 - Data cleanup started INFO 2018-04-09 09:29:45,570 DataCleaner.py:122 - Data cleanup finished INFO 2018-04-09 09:29:45,575 PingPortListener.py:50 - Ping port listener started on port: 8670 INFO 2018-04-09 09:29:45,576 main.py:349 - Connecting to Ambari server at https://eureambarimaster1.local.eurecat.org:8440 (172.20.61.91) INFO 2018-04-09 09:29:45,577 NetUtil.py:62 - Connecting to https://eureambarimaster1.local.eurecat.org:8440/ca ERROR 2018-04-09 09:29:45,637 NetUtil.py:88 - [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:579) ERROR 2018-04-09 09:29:45,637 NetUtil.py:89 - SSLError: Failed to connect. Please check openssl library versions. Refer to: https://bugzilla.redhat.com/show_bug.cgi?id=1022468 for more details. WARNING 2018-04-09 09:29:45,639 NetUtil.py:116 - Server at https://eureambarimaster1.local.eurecat.org:8440 is not reachable, sleeping for 10 seconds... ', None) ('WARNING 2018-04-09 09:22:38,867 NetUtil.py:116 - Server at https://eureambarimaster1.local.eurecat.org:8440 is not reachable, sleeping for 10 seconds... INFO 2018-04-09 09:22:38,868 HeartbeatHandlers.py:115 - Stop event received INFO 2018-04-09 09:22:38,868 NetUtil.py:122 - Stop event received INFO 2018-04-09 09:22:38,868 ExitHelper.py:53 - Performing cleanup before exiting... INFO 2018-04-09 09:22:38,868 ExitHelper.py:67 - Cleanup finished, exiting with code:0 INFO 2018-04-09 09:22:40,140 main.py:223 - Agent died gracefully, exiting. INFO 2018-04-09 09:22:40,140 ExitHelper.py:53 - Performing cleanup before exiting... INFO 2018-04-09 09:29:45,567 main.py:90 - loglevel=logging.INFO INFO 2018-04-09 09:29:45,567 main.py:90 - loglevel=logging.INFO INFO 2018-04-09 09:29:45,567 main.py:90 - loglevel=logging.INFO INFO 2018-04-09 09:29:45,569 DataCleaner.py:39 - Data cleanup thread started INFO 2018-04-09 09:29:45,570 DataCleaner.py:120 - Data cleanup started INFO 2018-04-09 09:29:45,570 DataCleaner.py:122 - Data cleanup finished INFO 2018-04-09 09:29:45,575 PingPortListener.py:50 - Ping port listener started on port: 8670 INFO 2018-04-09 09:29:45,576 main.py:349 - Connecting to Ambari server at https://eureambarimaster1.local.eurecat.org:8440 (172.20.61.91) INFO 2018-04-09 09:29:45,577 NetUtil.py:62 - Connecting to https://eureambarimaster1.local.eurecat.org:8440/ca ERROR 2018-04-09 09:29:45,637 NetUtil.py:88 - [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:579) ERROR 2018-04-09 09:29:45,637 NetUtil.py:89 - SSLError: Failed to connect. Please check openssl library versions. Refer to: https://bugzilla.redhat.com/show_bug.cgi?id=1022468 for more details. WARNING 2018-04-09 09:29:45,639 NetUtil.py:116 - Server at https://eureambarimaster1.local.eurecat.org:8440 is not reachable, sleeping for 10 seconds... ', None) Connection to eureambarimaster1.local.eurecat.org closed. SSH command execution finished host=eureambarimaster1.local.eurecat.org, exitcode=0 Command end time 2018-04-09 09:29:48 Registering with the server...
Registration with the server failed.
Created 04-09-2018 09:49 AM
Your recent error is due to :
ERROR 2018-04-09 09:29:45,637 NetUtil.py:88 - [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:579)
So please try the following, If the file "/etc/python/cert-verification.cfg" does not exist then please create one.
# sed -i 's/verify=platform_default/verify=disable/' /etc/python/cert-verification.cfg
.
Created 04-09-2018 09:53 AM
[SSL: CERTIFICATE_VERIFY_FAILED]
the following doc provides more detailed information about the "certificate verify failed (_ssl.c" issue
while using RHEL7: Controlling and troubleshooting certificate
verification:
https://access.redhat.com/articles/2039753#controlling-certificate-verification-7
Created 04-09-2018 12:41 PM
If this resolved your query then please mark this HCC thread as answered by clicking on "Accept" link on the correct answer, That way it will help other HCC users to quickly find the answers.
Created 04-09-2018 09:59 AM
Thank you so much. It worked! I had to execute this command in the master and slaves.