Created 05-24-2017 02:25 PM
Log attached : ambari-server.txt
I am trying to install a 3 node cluster using ambari
Machine 1 : Master Node + Local repository + Ambari Server
Machine 2 and 3 : Data Node
Local repository is on the same system as master node and ambari-server.
I am using ambari server 2.4.2 to install hdp 2.5
repolist command lists :
1) HDP-2.5
2) HDP-UTILS-1.1.0.21
3) Updates-ambari-2.4.2.0
I am using passwordless SSH for user hdpuser. This is the logged in user in the system. It is also sudo user. I have confirmed that password SSH works for this user.
After lot of install failures, I am trying to install ONLY master node to understand problem.
I am not get past auto-registration step itself.
I am getting lot of messages like following. Not sure if they are "normal" or cause of problem :
24 May 2017 15:43:14,842 ERROR [alert-event-bus-2] AlertReceivedListener:480 - Unable to process alert ambari_agent_disk_usage for an invalid cluster named HDP
24 May 2017 15:17:17,424 INFO [ambari-client-thread-25] RepoUtil:156 - Found 0 service repos: []
Registration failure logs :
24 May 2017 15:22:33,895 INFO [ambari-client-thread-25] BootStrapImpl:108 - BootStrapping hosts machine1.domain.net: 24 May 2017 15:22:33,901 INFO [Thread-44] BSRunner:190 - Kicking off the scheduler for polling on logs in /var/run/ambari-server/bootstrap/1
24 May 2017 15:22:33,901 INFO [Thread-44] BSRunner:257 - Host= machine1.domain.net bs=/usr/lib/python2.6/site-packages/ambari_server/bootstrap.py requestDir=/var/run/ambari-server/bootstrap/1 user=hdpuser sshPort=22 keyfile=/var/run/ambari-server/bootstrap/1/sshKey passwordFile null server=machine1.domain.net version=2.4.2.0 serverPort=8080 userRunAs=root timeout=300
24 May 2017 15:22:33,903 INFO [pool-16-thread-1] BSHostStatusCollector:55 - Request directory /var/run/ambari-server/bootstrap/1
24 May 2017 15:22:33,903 INFO [pool-16-thread-1] BSHostStatusCollector:62 - HostList for polling on [machine1.domain.net]
24 May 2017 15:22:33,906 INFO [Thread-44] BSRunner:285 - Bootstrap output, log=/var/run/ambari-server/bootstrap/1/bootstrap.err /var/run/ambari-server/bootstrap/1/bootstrap.out at machine1.domain.net
24 May 2017 15:22:43,906 INFO [pool-16-thread-1] BSHostStatusCollector:55 - Request directory /var/run/ambari-server/bootstrap/1
24 May 2017 15:26:53,930 INFO [pool-16-thread-1] BSHostStatusCollector:62 - HostList for polling on [machine1.domain.net]
24 May 2017 15:27:34,092 WARN [Thread-44] BSRunner:292 - Bootstrap process timed out. It will be destroyed.
24 May 2017 15:27:34,093 INFO [Thread-44] BSRunner:309 - Script log Mesg INFO:root:BootStrapping hosts ['machine1.domain.net'] using /usr/lib/python2.6/site-packages/ambari_server cluster primary OS: redhat7 with user 'hdpuser'with ssh Port '22' sshKey File /var/run/ambari-server/bootstrap/1/sshKey password File null using tmp dir /var/run/ambari-server/bootstrap/1 ambari: machine1.domain.net; server_port: 8080; ambari version: 2.4.2.0; user_run_as: root INFO:root:Executing parallel bootstrap Bootstrap process timed out. It was destroyed.
Created 05-26-2017 05:51 AM
Finally solved!! The problem was that even if SSH was setup without password, sudo user ('hdpuser') was setup with password (This means for sudo commands, i was prompted for password).
I modified sudoers entry to make it password less on all cluster machines. That did it!
Created 05-24-2017 06:10 PM
Can you check if you can SSH from the Ambari Server to the Agents?
You might have some DNS issues or it is not able to SSH
Created 05-24-2017 06:22 PM
I am able to SSH from ambari server (system 1) system to data node systems (system 2 and 3)
I have added all the three hosts in /etc/hosts of all the three systems.
"hostname -f" on all the three systems gives me FQDN
Just thinking could this be proxy server issue ? I have to bypass proxy in FireFox to access localhost.
I have done proxy=_none_ in .repo files to skip proxy for yum.
Not sure if I have to do the same for curl ?
Created 05-24-2017 06:23 PM
Can you install Ambari agent on those machines? Or any package from your repo
Created 05-24-2017 06:31 PM
Yes. I can manually install agents on all the three systems. (As I said, I have added noproxy to repo files. Without that it wasn't working)
Created 05-24-2017 06:41 PM
If I do "curl <localrepository>", I am getting DNS error. Just wondering if this is the cause of problem.
Created 05-25-2017 02:55 PM
I fixed proxy setting for curl, wget, ambari-server and yum. Still I am getting exactly the same problem.
Created 05-24-2017 09:33 PM
@MB If you're getting a DNS error, that needs to be resolved either by configuring DNS for the hosts or by manually adding the host info to /etc/hosts on each node before you retry the cluster installation. Same goes for the repositories if you're using local repos.
Created 05-25-2017 02:56 PM
I fixed DNS & proxy settings. Getting exactly the same error. Bootstrap timing out.
Created 05-26-2017 05:51 AM
Finally solved!! The problem was that even if SSH was setup without password, sudo user ('hdpuser') was setup with password (This means for sudo commands, i was prompted for password).
I modified sudoers entry to make it password less on all cluster machines. That did it!