Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Installation of HDP2.4 with Ambari got stuck

avatar

I am setting up a 3 node HDP cluster using Ambari. I have installed ambari server and continuing step wise installation through web UI. I had completed setup till step 9, install,start and test but it was not moving ahead of 32%(see screenshot 2017-08-19.png). So, I restarted the ambari server and now the UI has started from first step again.

Now, I have reached step 3, confirm hosts. The hosts are registered and status is success but The UI is not moving ahead and the message being show is 'Please wait while the hosts are being checked for potential problems...' (see screenshot 2017-08-19-2.png) I tried waiting for a long time but still it doesn't progresses further.

I have also tried restarting ambari agents, restarting ambari server but still same result. These steps I am performing now are already passed once successfully and I haven't changed anything major on the nodes.

Please suggest a solution.


2017-08-19.png
1 ACCEPTED SOLUTION

avatar
Master Mentor

@Anup Shirolkar

Ambari server log

2017-08-18 11:17:39,720 [CRITICAL] [HIVE] [hive_server_process] (HiveServer2 Process) Connection failed on host hdp25-node2.wulme4ci31tu3lwdofvykqwgkh.bx.internal.cloudapp.net:10000 (Traceback (most recent call last): 

Ensure ambari agent is running and the port is is free

(Ambari Agent Heartbeat) hdp25-node1.wulme4ci31tu3lwdofvykqwgkh.bx.internal.cloudapp.net

copy the Ambari,HDP* .repo to /etc/yum.repos.d/ to all other hosts

Confirm the repos are accessible by

# yum repolist

You should see something like this

HDP-2.3.2.0                                       | 2.9 kB     00:00
HDP-UTILS-1.1.0.20                                | 2.9 kB     00:00
Updates-ambari-2.1.2.1                            | 2.9 kB     00:00

Check the ambari-agents on these nodes are running if not restart them ensure the value hostname points to your ambari server in the /etc/ambari-agent/conf/ambari-agent.ini

[server] 
hostname={your-ambari-server} 
url_port=8440 
secured_url_port=8441 
hdp25-node1.wulme4ci31tu3lwdofvykqwgkh.bx.internal.cloudapp.net is not sending heartbeats 
hdp25-node2.wulme4ci31tu3lwdofvykqwgkh.bx.internal.cloudapp.net is not sending heartbeats 
hdp25-node3.wulme4ci31tu3lwdofvykqwgkh.bx.internal.cloudapp.net is not sending heartbeats 

Error

Caused by: org.apache.ambari.server.HostNotFoundException: Host not found, hostname= 

Double check your DNS.

$hostname -f 

The output should be FQDN I see a lot of connection refused in the log can you ensure the ambari server can access the other hosts in the cluster

View solution in original post

15 REPLIES 15

avatar

@Geoffrey Shelton Okot

Thanks for bearing with me.

I checked all the configurations as per your suggestion.

1) Ambari-server and agents (all hosts) are running.

2) the repolist have entries as required :

repo id                
!HDP-2.4               
!HDP-UTILS-1.1.0.20    
!Updates-ambari-2.2.2.0

3) /etc/ambari-agent/conf/ambari-agent.ini shows hostname for server (all 3 nodes):

[server]
hostname=hdp25-node1.wulme4ci31tu3lwdofvykqwgkh.bx.internal.cloudapp.net
url_port=8440
secured_url_port=8441

[agent]
prefix=/var/lib/ambari-agent/data
.....

and the hostnames are also correct

[anup@hdp25-node3 ~]$ cat /etc/hosts
127.0.0.1   hdp25-node3.wulme4ci31tu3lwdofvykqwgkh.bx.internal.cloudapp.net localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
10.0.0.4 hdp25-node1.wulme4ci31tu3lwdofvykqwgkh.bx.internal.cloudapp.net
10.0.0.5 hdp25-node2.wulme4ci31tu3lwdofvykqwgkh.bx.internal.cloudapp.net
10.0.0.6 hdp25-node3.wulme4ci31tu3lwdofvykqwgkh.bx.internal.cloudapp.net

I checked the SELinux configuration with `getenforce` and it was permissible so I changed it to `disabled` and rebooted the nodes so that it can take effect.

But, I think this restart has messed up some config of my server. All the setting related to ambari are as given above. But now I cant access the UI like before. I get 'The site cant be find' error.

My setup is based on Microsoft azure, I have tried using the url in different ways (public-ip, hostname)

I have been checking several options like (see from other post)

curl -u admin:admin -i -H 'X-Requested-By: ambari' -X POST -d '{"wizard-data":"{\"userName\":\"admin\",\"controllerName\":\"\"}"}' http://40.121.204.95:8080/api/v1/persist

This one times out.

I dont have any iptables setup. Ambari is show in ps and also 8080 is listening:

[root@hdp25-node1 anup]# service iptables stop
Redirecting to /bin/systemctl stop iptables.service
Failed to stop iptables.service: Unit iptables.service not loaded.

[root@hdp25-node1 anup]# ps -aux | grep ambari-server
root       1965  0.0  0.0  11636   624 pts/0    S    05:32   0:00 /bin/sh -c ulimit -n 10000 ; /opt/jdk1.8.0_141/bin/java -server -XX:NewRatio=3 -XX:+UseConcMarkSweepGC -XX:-UseGCOverheadLimit -XX:CMSInitiatingOccupancyFraction=60 -Dsun.zip.disableMemoryMapping=true   -Xms512m -Xmx2048m -Djava.security.auth.login.config=/etc/ambari-server/conf/krb5JAASLogin.conf -Djava.security.krb5.conf=/etc/krb5.conf -Djavax.security.auth.useSubjectCredsOnly=false -Xms512m -Xmx2048m -Djava.security.auth.login.config=/etc/ambari-server/conf/krb5JAASLogin.conf -Djava.security.krb5.conf=/etc/krb5.conf -Djavax.security.auth.useSubjectCredsOnly=false -cp '/etc/ambari-server/conf:/usr/lib/ambari-server/*:/usr/share/java/postgresql-jdbc.jar' org.apache.ambari.server.controller.AmbariServer > /var/log/ambari-server/ambari-server.out 2>&1 || echo $? > /var/run/ambari-server/ambari-server.exitcode &
root       1966 20.0  3.0 5683744 500464 pts/0  Sl   05:32   1:07 /opt/jdk1.8.0_141/bin/java -server -XX:NewRatio=3 -XX:+UseConcMarkSweepGC -XX:-UseGCOverheadLimit -XX:CMSInitiatingOccupancyFraction=60 -Dsun.zip.disableMemoryMapping=true -Xms512m -Xmx2048m -Djava.security.auth.login.config=/etc/ambari-server/conf/krb5JAASLogin.conf -Djava.security.krb5.conf=/etc/krb5.conf -Djavax.security.auth.useSubjectCredsOnly=false -Xms512m -Xmx2048m -Djava.security.auth.login.config=/etc/ambari-server/conf/krb5JAASLogin.conf -Djava.security.krb5.conf=/etc/krb5.conf -Djavax.security.auth.useSubjectCredsOnly=false -cp /etc/ambari-server/conf:/usr/lib/ambari-server/*:/usr/share/java/postgresql-jdbc.jar org.apache.ambari.server.controller.AmbariServer
root       2570  0.0  0.0 112660   976 pts/0    S+   05:38   0:00 grep --color=auto ambari-server

[root@hdp25-node1 anup]# netstat -anop | grep 8080
tcp6       0      0 :::8080                 :::*                    LISTEN      1966/java            off (0.00/0/0)

Please let know which bit I am missing.

Latest logs:

tail -fn 100 /var/log/ambari-server/ambari-server.log

[root@hdp25-node1 anup]# tail -fn 100 /var/log/ambari-server/ambari-server.log
20 Aug 2017 05:33:06,674  WARN [ambari-hearbeat-monitor] HeartbeatMonitor:172 - Setting component state to UNKNOWN for component HST_SERVER on hdp25-node1.wulme4ci31tu3lwdofvykqwgkh.bx.internal.cloudapp.net
20 Aug 2017 05:33:06,686  INFO [main] AmbariServer:534 - ********* Initializing ActionManager **********
20 Aug 2017 05:33:06,687  INFO [main] AmbariServer:537 - ********* Initializing Controller **********
20 Aug 2017 05:33:06,687  INFO [main] AmbariServer:541 - ********* Initializing Scheduled Request Manager **********
20 Aug 2017 05:33:06,689  WARN [ambari-hearbeat-monitor] HeartbeatMonitor:172 - Setting component state to UNKNOWN for component SPARK_JOBHISTORYSERVER on hdp25-node1.wulme4ci31tu3lwdofvykqwgkh.bx.internal.cloudapp.net
20 Aug 2017 05:33:06,699  WARN [ambari-hearbeat-monitor] HeartbeatMonitor:172 - Setting component state to UNKNOWN for component SUPERVISOR on hdp25-node1.wulme4ci31tu3lwdofvykqwgkh.bx.internal.cloudapp.net
20 Aug 2017 05:33:06,701  INFO [main] Server:272 - jetty-8.1.17.v20150415
20 Aug 2017 05:33:06,709  WARN [ambari-hearbeat-monitor] HeartbeatMonitor:172 - Setting component state to UNKNOWN for component NODEMANAGER on hdp25-node1.wulme4ci31tu3lwdofvykqwgkh.bx.internal.cloudapp.net
20 Aug 2017 05:33:06,720  WARN [ambari-hearbeat-monitor] HeartbeatMonitor:172 - Setting component state to UNKNOWN for component ZOOKEEPER_SERVER on hdp25-node1.wulme4ci31tu3lwdofvykqwgkh.bx.internal.cloudapp.net
20 Aug 2017 05:33:06,990  WARN [ambari-hearbeat-monitor] HeartbeatMonitor:157 - Heartbeat lost from host hdp25-node3.wulme4ci31tu3lwdofvykqwgkh.bx.internal.cloudapp.net
20 Aug 2017 05:33:06,992  INFO [ambari-hearbeat-monitor] TopologyManager:387 - Hearbeat for host hdp25-node3.wulme4ci31tu3lwdofvykqwgkh.bx.internal.cloudapp.net lost thus removing it from available hosts.
20 Aug 2017 05:33:06,993  WARN [ambari-hearbeat-monitor] HeartbeatMonitor:172 - Setting component state to UNKNOWN for component METRICS_MONITOR on hdp25-node3.wulme4ci31tu3lwdofvykqwgkh.bx.internal.cloudapp.net
20 Aug 2017 05:33:06,994  WARN [ambari-hearbeat-monitor] HeartbeatMonitor:172 - Setting component state to UNKNOWN for component METRICS_COLLECTOR on hdp25-node3.wulme4ci31tu3lwdofvykqwgkh.bx.internal.cloudapp.net
20 Aug 2017 05:33:06,995  WARN [ambari-hearbeat-monitor] HeartbeatMonitor:172 - Setting component state to UNKNOWN for component FLUME_HANDLER on hdp25-node3.wulme4ci31tu3lwdofvykqwgkh.bx.internal.cloudapp.net
20 Aug 2017 05:33:06,997  WARN [ambari-hearbeat-monitor] HeartbeatMonitor:172 - Setting component state to UNKNOWN for component DATANODE on hdp25-node3.wulme4ci31tu3lwdofvykqwgkh.bx.internal.cloudapp.net
20 Aug 2017 05:33:06,999  WARN [ambari-hearbeat-monitor] HeartbeatMonitor:172 - Setting component state to UNKNOWN for component HST_AGENT on hdp25-node3.wulme4ci31tu3lwdofvykqwgkh.bx.internal.cloudapp.net
20 Aug 2017 05:33:07,000  WARN [ambari-hearbeat-monitor] HeartbeatMonitor:172 - Setting component state to UNKNOWN for component SUPERVISOR on hdp25-node3.wulme4ci31tu3lwdofvykqwgkh.bx.internal.cloudapp.net
20 Aug 2017 05:33:07,002  WARN [ambari-hearbeat-monitor] HeartbeatMonitor:172 - Setting component state to UNKNOWN for component NODEMANAGER on hdp25-node3.wulme4ci31tu3lwdofvykqwgkh.bx.internal.cloudapp.net
20 Aug 2017 05:33:07,003  WARN [ambari-hearbeat-monitor] HeartbeatMonitor:172 - Setting component state to UNKNOWN for component ZOOKEEPER_SERVER on hdp25-node3.wulme4ci31tu3lwdofvykqwgkh.bx.internal.cloudapp.net
20 Aug 2017 05:33:07,096  WARN [ambari-hearbeat-monitor] HeartbeatMonitor:157 - Heartbeat lost from host hdp25-node2.wulme4ci31tu3lwdofvykqwgkh.bx.internal.cloudapp.net
20 Aug 2017 05:33:07,101  INFO [ambari-hearbeat-monitor] TopologyManager:387 - Hearbeat for host hdp25-node2.wulme4ci31tu3lwdofvykqwgkh.bx.internal.cloudapp.net lost thus removing it from available hosts.
20 Aug 2017 05:33:07,101  WARN [ambari-hearbeat-monitor] HeartbeatMonitor:172 - Setting component state to UNKNOWN for component METRICS_MONITOR on hdp25-node2.wulme4ci31tu3lwdofvykqwgkh.bx.internal.cloudapp.net
20 Aug 2017 05:33:07,102  WARN [ambari-hearbeat-monitor] HeartbeatMonitor:172 - Setting component state to UNKNOWN for component FLUME_HANDLER on hdp25-node2.wulme4ci31tu3lwdofvykqwgkh.bx.internal.cloudapp.net
20 Aug 2017 05:33:07,103  WARN [ambari-hearbeat-monitor] HeartbeatMonitor:172 - Setting component state to UNKNOWN for component SECONDARY_NAMENODE on hdp25-node2.wulme4ci31tu3lwdofvykqwgkh.bx.internal.cloudapp.net
20 Aug 2017 05:33:07,104  WARN [ambari-hearbeat-monitor] HeartbeatMonitor:172 - Setting component state to UNKNOWN for component DATANODE on hdp25-node2.wulme4ci31tu3lwdofvykqwgkh.bx.internal.cloudapp.net
20 Aug 2017 05:33:07,105  WARN [ambari-hearbeat-monitor] HeartbeatMonitor:172 - Setting component state to UNKNOWN for component HIVE_SERVER on hdp25-node2.wulme4ci31tu3lwdofvykqwgkh.bx.internal.cloudapp.net
20 Aug 2017 05:33:07,105  WARN [ambari-hearbeat-monitor] HeartbeatMonitor:172 - Setting component state to UNKNOWN for component MYSQL_SERVER on hdp25-node2.wulme4ci31tu3lwdofvykqwgkh.bx.internal.cloudapp.net
20 Aug 2017 05:33:07,106  WARN [ambari-hearbeat-monitor] HeartbeatMonitor:172 - Setting component state to UNKNOWN for component HIVE_METASTORE on hdp25-node2.wulme4ci31tu3lwdofvykqwgkh.bx.internal.cloudapp.net
20 Aug 2017 05:33:07,107  WARN [ambari-hearbeat-monitor] HeartbeatMonitor:172 - Setting component state to UNKNOWN for component WEBHCAT_SERVER on hdp25-node2.wulme4ci31tu3lwdofvykqwgkh.bx.internal.cloudapp.net
20 Aug 2017 05:33:07,108  WARN [ambari-hearbeat-monitor] HeartbeatMonitor:172 - Setting component state to UNKNOWN for component HISTORYSERVER on hdp25-node2.wulme4ci31tu3lwdofvykqwgkh.bx.internal.cloudapp.net
20 Aug 2017 05:33:07,109  WARN [ambari-hearbeat-monitor] HeartbeatMonitor:172 - Setting component state to UNKNOWN for component OOZIE_SERVER on hdp25-node2.wulme4ci31tu3lwdofvykqwgkh.bx.internal.cloudapp.net
20 Aug 2017 05:33:07,110  WARN [ambari-hearbeat-monitor] HeartbeatMonitor:172 - Setting component state to UNKNOWN for component HST_AGENT on hdp25-node2.wulme4ci31tu3lwdofvykqwgkh.bx.internal.cloudapp.net
20 Aug 2017 05:33:07,111  WARN [ambari-hearbeat-monitor] HeartbeatMonitor:172 - Setting component state to UNKNOWN for component SUPERVISOR on hdp25-node2.wulme4ci31tu3lwdofvykqwgkh.bx.internal.cloudapp.net
20 Aug 2017 05:33:07,111  WARN [ambari-hearbeat-monitor] HeartbeatMonitor:172 - Setting component state to UNKNOWN for component NIMBUS on hdp25-node2.wulme4ci31tu3lwdofvykqwgkh.bx.internal.cloudapp.net
20 Aug 2017 05:33:07,112  WARN [ambari-hearbeat-monitor] HeartbeatMonitor:172 - Setting component state to UNKNOWN for component DRPC_SERVER on hdp25-node2.wulme4ci31tu3lwdofvykqwgkh.bx.internal.cloudapp.net
20 Aug 2017 05:33:07,113  WARN [ambari-hearbeat-monitor] HeartbeatMonitor:172 - Setting component state to UNKNOWN for component STORM_UI_SERVER on hdp25-node2.wulme4ci31tu3lwdofvykqwgkh.bx.internal.cloudapp.net
20 Aug 2017 05:33:07,113  WARN [ambari-hearbeat-monitor] HeartbeatMonitor:172 - Setting component state to UNKNOWN for component NODEMANAGER on hdp25-node2.wulme4ci31tu3lwdofvykqwgkh.bx.internal.cloudapp.net
20 Aug 2017 05:33:07,114  WARN [ambari-hearbeat-monitor] HeartbeatMonitor:172 - Setting component state to UNKNOWN for component APP_TIMELINE_SERVER on hdp25-node2.wulme4ci31tu3lwdofvykqwgkh.bx.internal.cloudapp.net
20 Aug 2017 05:33:07,115  WARN [ambari-hearbeat-monitor] HeartbeatMonitor:172 - Setting component state to UNKNOWN for component RESOURCEMANAGER on hdp25-node2.wulme4ci31tu3lwdofvykqwgkh.bx.internal.cloudapp.net
20 Aug 2017 05:33:07,116  WARN [ambari-hearbeat-monitor] HeartbeatMonitor:172 - Setting component state to UNKNOWN for component ZOOKEEPER_SERVER on hdp25-node2.wulme4ci31tu3lwdofvykqwgkh.bx.internal.cloudapp.net
20 Aug 2017 05:33:16,570  INFO [main] AbstractConnector:338 - Started SelectChannelConnector@0.0.0.0:8080
20 Aug 2017 05:33:16,570  INFO [main] Server:272 - jetty-8.1.17.v20150415
20 Aug 2017 05:33:18,680  INFO [main] SslContextFactory:300 - Enabled Protocols [SSLv2Hello, TLSv1, TLSv1.1, TLSv1.2] of [SSLv2Hello, SSLv3, TLSv1, TLSv1.1, TLSv1.2]
20 Aug 2017 05:33:18,681  INFO [main] AbstractConnector:338 - Started SslSelectChannelConnector@0.0.0.0:8440
20 Aug 2017 05:33:18,694  INFO [main] SslContextFactory:300 - Enabled Protocols [SSLv2Hello, TLSv1, TLSv1.1, TLSv1.2] of [SSLv2Hello, SSLv3, TLSv1, TLSv1.1, TLSv1.2]
20 Aug 2017 05:33:18,695  INFO [main] AbstractConnector:338 - Started SslSelectChannelConnector@0.0.0.0:8441
20 Aug 2017 05:33:18,695  INFO [main] AmbariServer:558 - ********* Started Server **********
20 Aug 2017 05:33:18,695  INFO [main] ActionManager:77 - Starting scheduler thread
20 Aug 2017 05:33:18,696  INFO [main] ServerActionExecutor:146 - Starting Server Action Executor thread...
20 Aug 2017 05:33:18,698  INFO [main] ServerActionExecutor:173 - Server Action Executor thread started.
20 Aug 2017 05:33:18,698  INFO [main] AmbariServer:561 - ********* Started ActionManager **********
20 Aug 2017 05:33:18,698  INFO [main] ExecutionScheduleManager:201 - Starting scheduler
20 Aug 2017 05:33:18,762  INFO [MLog-Init-Reporter] MLog:212 - MLog clients using slf4j logging.
20 Aug 2017 05:33:18,845  INFO [main] C3P0Registry:212 - Initializing c3p0-0.9.5.2 [built 08-December-2015 22:06:04 -0800; debug? true; trace: 10]
20 Aug 2017 05:33:18,887  INFO [main] StdSchedulerFactory:1184 - Using default implementation for ThreadExecutor
20 Aug 2017 05:33:18,906  INFO [main] SchedulerSignalerImpl:61 - Initialized Scheduler Signaller of type: class org.quartz.core.SchedulerSignalerImpl
20 Aug 2017 05:33:18,906  INFO [main] QuartzScheduler:240 - Quartz Scheduler v.2.2.1 created.
20 Aug 2017 05:33:18,907  INFO [main] JobStoreTX:670 - Using thread monitor-based data access locking (synchronization).
20 Aug 2017 05:33:18,908  INFO [main] JobStoreTX:59 - JobStoreTX initialized.
20 Aug 2017 05:33:18,909  INFO [main] QuartzScheduler:305 - Scheduler meta-data: Quartz Scheduler (v2.2.1) 'ExecutionScheduler' with instanceId 'NON_CLUSTERED'
  Scheduler class: 'org.quartz.core.QuartzScheduler' - running locally.
  NOT STARTED.
  Currently in standby mode.
  Number of jobs executed: 0
  Using thread pool 'org.quartz.simpl.SimpleThreadPool' - with 5 threads.
  Using job-store 'org.quartz.impl.jdbcjobstore.JobStoreTX' - which supports persistence. and is not clustered.


20 Aug 2017 05:33:18,909  INFO [main] StdSchedulerFactory:1339 - Quartz scheduler 'ExecutionScheduler' initialized from an externally provided properties instance.
20 Aug 2017 05:33:18,909  INFO [main] StdSchedulerFactory:1343 - Quartz scheduler version: 2.2.1
20 Aug 2017 05:33:18,909  INFO [main] QuartzScheduler:2311 - JobFactory set to: org.apache.ambari.server.state.scheduler.GuiceJobFactory@14a97e7a
20 Aug 2017 05:33:18,910  INFO [main] AmbariServer:564 - ********* Started Scheduled Request Manager **********
20 Aug 2017 05:33:18,914  INFO [main] AmbariServer:567 - ********* Started Services **********
20 Aug 2017 05:33:18,924  INFO [AmbariServerAlertService STARTING] AmbariServerAlertService:257 - Scheduled server alert ambari_server_agent_heartbeat to run every 2 minutes
20 Aug 2017 05:33:18,925  INFO [AmbariServerAlertService STARTING] AmbariServerAlertService:257 - Scheduled server alert ambari_server_stale_alerts to run every 5 minutes
20 Aug 2017 05:33:43,587 ERROR [qtp-ambari-agent-54] HeartBeatHandler:198 - CurrentResponseId unknown for hdp25-node1.wulme4ci31tu3lwdofvykqwgkh.bx.internal.cloudapp.net - send register command
20 Aug 2017 05:33:43,668  INFO [qtp-ambari-agent-55] HeartBeatHandler:402 - agentOsType = redhat7
20 Aug 2017 05:33:43,739  INFO [qtp-ambari-agent-55] HostImpl:285 - Received host registration, host=[hostname=hdp25-node1,fqdn=hdp25-node1.wulme4ci31tu3lwdofvykqwgkh.bx.internal.cloudapp.net,domain=wulme4ci31tu3lwdofvykqwgkh.bx.internal.cloudapp.net,architecture=x86_64,processorcount=4,physicalprocessorcount=4,osname=redhat,osversion=7.3,osfamily=redhat,memory=16416608,uptime_hours=0,mounts=(available=25310412,mountpoint=/,used=7714864,percent=24%,size=33025276,device=/dev/sda2,type=xfs)(available=8198368,mountpoint=/dev,used=0,percent=0%,size=8198368,device=devtmpfs,type=devtmpfs)(available=8208304,mountpoint=/dev/shm,used=0,percent=0%,size=8208304,device=tmpfs,type=tmpfs)(available=8199824,mountpoint=/run,used=8480,percent=1%,size=8208304,device=tmpfs,type=tmpfs)(available=403144,mountpoint=/boot,used=105436,percent=21%,size=508580,device=/dev/sda1,type=xfs)(available=29055312,mountpoint=/mnt/resource,used=2146336,percent=7%,size=32895696,device=/dev/sdb1,type=ext4)]
, registrationTime=1503207223668, agentVersion=2.2.2.0
20 Aug 2017 05:33:43,740  INFO [qtp-ambari-agent-55] TopologyManager:311 - TopologyManager.onHostRegistered: Entering
20 Aug 2017 05:33:43,740  INFO [qtp-ambari-agent-55] TopologyManager:313 - TopologyManager.onHostRegistered: host = hdp25-node1.wulme4ci31tu3lwdofvykqwgkh.bx.internal.cloudapp.net is already associated with the cluster or is currently being processed
20 Aug 2017 05:33:43,859  INFO [qtp-ambari-agent-55] HeartBeatHandler:469 - Recovery configuration set to RecoveryConfig{, type=AUTO_START, maxCount=6, windowInMinutes=60, retryGap=5, maxLifetimeCount=1024, disabledComponents=, enabledComponents=METRICS_COLLECTOR}
20 Aug 2017 05:33:55,422  INFO [ambari-heartbeat-processor-0] HeartbeatProcessor:603 - State of service component METRICS_GRAFANA of service AMBARI_METRICS of cluster hdp24 has changed from UNKNOWN to INSTALLED at host hdp25-node1.wulme4ci31tu3lwdofvykqwgkh.bx.internal.cloudapp.net according to STATUS_COMMAND report
20 Aug 2017 05:33:55,435  INFO [ambari-heartbeat-processor-0] HeartbeatProcessor:603 - State of service component FLUME_HANDLER of service FLUME of cluster hdp24 has changed from UNKNOWN to INSTALLED at host hdp25-node1.wulme4ci31tu3lwdofvykqwgkh.bx.internal.cloudapp.net according to STATUS_COMMAND report
20 Aug 2017 05:33:55,451  INFO [ambari-heartbeat-processor-0] HeartbeatProcessor:603 - State of service component METRICS_MONITOR of service AMBARI_METRICS of cluster hdp24 has changed from UNKNOWN to INSTALLED at host hdp25-node1.wulme4ci31tu3lwdofvykqwgkh.bx.internal.cloudapp.net according to STATUS_COMMAND report
20 Aug 2017 05:33:55,462  INFO [ambari-heartbeat-processor-0] HeartbeatProcessor:603 - State of service component SPARK_JOBHISTORYSERVER of service SPARK of cluster hdp24 has changed from UNKNOWN to INSTALLED at host hdp25-node1.wulme4ci31tu3lwdofvykqwgkh.bx.internal.cloudapp.net according to STATUS_COMMAND report
20 Aug 2017 05:33:55,477  INFO [ambari-heartbeat-processor-0] HeartbeatProcessor:603 - State of service component HST_AGENT of service SMARTSENSE of cluster hdp24 has changed from UNKNOWN to STARTED at host hdp25-node1.wulme4ci31tu3lwdofvykqwgkh.bx.internal.cloudapp.net according to STATUS_COMMAND report
20 Aug 2017 05:33:55,492  INFO [ambari-heartbeat-processor-0] HeartbeatProcessor:603 - State of service component SUPERVISOR of service STORM of cluster hdp24 has changed from UNKNOWN to INSTALLED at host hdp25-node1.wulme4ci31tu3lwdofvykqwgkh.bx.internal.cloudapp.net according to STATUS_COMMAND report
20 Aug 2017 05:33:55,501  INFO [ambari-heartbeat-processor-0] HeartbeatProcessor:603 - State of service component NODEMANAGER of service YARN of cluster hdp24 has changed from UNKNOWN to INSTALLED at host hdp25-node1.wulme4ci31tu3lwdofvykqwgkh.bx.internal.cloudapp.net according to STATUS_COMMAND report
20 Aug 2017 05:33:55,508  INFO [ambari-heartbeat-processor-0] HeartbeatProcessor:603 - State of service component KNOX_GATEWAY of service KNOX of cluster hdp24 has changed from UNKNOWN to INSTALLED at host hdp25-node1.wulme4ci31tu3lwdofvykqwgkh.bx.internal.cloudapp.net according to STATUS_COMMAND report
20 Aug 2017 05:33:55,515  INFO [ambari-heartbeat-processor-0] HeartbeatProcessor:603 - State of service component ZOOKEEPER_SERVER of service ZOOKEEPER of cluster hdp24 has changed from UNKNOWN to INSTALLED at host hdp25-node1.wulme4ci31tu3lwdofvykqwgkh.bx.internal.cloudapp.net according to STATUS_COMMAND report
20 Aug 2017 05:33:55,521  INFO [ambari-heartbeat-processor-0] HeartbeatProcessor:603 - State of service component NAMENODE of service HDFS of cluster hdp24 has changed from UNKNOWN to INSTALLED at host hdp25-node1.wulme4ci31tu3lwdofvykqwgkh.bx.internal.cloudapp.net according to STATUS_COMMAND report
20 Aug 2017 05:33:55,527  INFO [ambari-heartbeat-processor-0] HeartbeatProcessor:603 - State of service component DATANODE of service HDFS of cluster hdp24 has changed from UNKNOWN to INSTALLED at host hdp25-node1.wulme4ci31tu3lwdofvykqwgkh.bx.internal.cloudapp.net according to STATUS_COMMAND report
20 Aug 2017 05:33:55,536  INFO [ambari-heartbeat-processor-0] HeartbeatProcessor:603 - State of service component HST_SERVER of service SMARTSENSE of cluster hdp24 has changed from UNKNOWN to STARTED at host hdp25-node1.wulme4ci31tu3lwdofvykqwgkh.bx.internal.cloudapp.net according to STATUS_COMMAND report
20 Aug 2017 05:35:18,967  INFO [Thread-21] AbstractPoolBackedDataSource:212 - Initializing c3p0 pool... com.mchange.v2.c3p0.ComboPooledDataSource [ acquireIncrement -> 3, acquireRetryAttempts -> 30, acquireRetryDelay -> 1000, autoCommitOnClose -> false, automaticTestTable -> null, breakAfterAcquireFailure -> false, checkoutTimeout -> 0, connectionCustomizerClassName -> null, connectionTesterClassName -> com.mchange.v2.c3p0.impl.DefaultConnectionTester, contextClassLoaderSource -> caller, dataSourceName -> z8kflt9p1yig0g5jqyczv|7767c18d, debugUnreturnedConnectionStackTraces -> false, description -> null, driverClass -> org.postgresql.Driver, extensions -> {}, factoryClassLocation -> null, forceIgnoreUnresolvedTransactions -> false, forceSynchronousCheckins -> false, forceUseNamedDriverClass -> false, identityToken -> z8kflt9p1yig0g5jqyczv|7767c18d, idleConnectionTestPeriod -> 50, initialPoolSize -> 3, jdbcUrl -> jdbc:postgresql://localhost/ambari, maxAdministrativeTaskTime -> 0, maxConnectionAge -> 0, maxIdleTime -> 0, maxIdleTimeExcessConnections -> 0, maxPoolSize -> 5, maxStatements -> 0, maxStatementsPerConnection -> 120, minPoolSize -> 1, numHelperThreads -> 3, preferredTestQuery -> select 0, privilegeSpawnedThreads -> false, properties -> {user=******, password=******}, propertyCycle -> 0, statementCacheNumDeferredCloseThreads -> 0, testConnectionOnCheckin -> true, testConnectionOnCheckout -> false, unreturnedConnectionTimeout -> 0, userOverrides -> {}, usesTraditionalReflectiveProxies -> false ]
20 Aug 2017 05:35:19,044  INFO [Thread-21] JobStoreTX:861 - Freed 0 triggers from 'acquired' / 'blocked' state.
20 Aug 2017 05:35:19,061  INFO [Thread-21] JobStoreTX:871 - Recovering 0 jobs that were in-progress at the time of the last shut-down.
20 Aug 2017 05:35:19,061  INFO [Thread-21] JobStoreTX:884 - Recovery complete.
20 Aug 2017 05:35:19,062  INFO [Thread-21] JobStoreTX:891 - Removed 0 'complete' triggers.
20 Aug 2017 05:35:19,062  INFO [Thread-21] JobStoreTX:896 - Removed 0 stale fired job entries.
20 Aug 2017 05:35:19,068  INFO [Thread-21] QuartzScheduler:575 - Scheduler ExecutionScheduler_$_NON_CLUSTERED started.


avatar
Master Mentor

@Anup Shirolkar

Your /etc/hosts entry looks wrong is should look like this you advised never change the first 2 lines for IPV4 and IPV6

127.0.0.1   localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
10.0.0.4 hdp25-node1.wulme4ci31tu3lwdofvykqwgkh.bx.internal.cloudapp.net
10.0.0.5 hdp25-node2.wulme4ci31tu3lwdofvykqwgkh.bx.internal.cloudapp.net
10.0.0.6 hdp25-node3.wulme4ci31tu3lwdofvykqwgkh.bx.internal.cloudapp.net

Can you change that on all the hosts and retry that could be the issue with connection lost !

Why does you repolist output have exclamation marks !!!!!??

!HDP-2.4              
!HDP-UTILS-1.1.0.20    
!Updates-ambari-2.2.2.0

Can you copy /paste the contents of the below files

cat /etc/yum.repos.d/ambari.repo
cat /etc/yum.repos.d/hdp.repo

avatar

I did the change in /etc/hosts as per your suggestion. Also the firewall service was restarted due to the VM restart, so I had to turn that down. Now the Ambari UI is connecting.

I have started services and tested few HDFS commands, also Hive commands but, I cannot see YARN resource UI even if the YARN service is up and running. Please suggest solution for that.

About ! in repo names, the RHEL site says it is due to invalid metadata.

contents of /etc/yum.repos.d/ambari.repo

[anup@hdp25-node1 ~]$ cat /etc/yum.repos.d/ambari.repo
#VERSION_NUMBER=2.2.2.0-460


[Updates-ambari-2.2.2.0]
name=ambari-2.2.2.0 - Updates
baseurl=http://public-repo-1.hortonworks.com/ambari/centos6/2.x/updates/2.2.2.0
gpgcheck=1
gpgkey=http://public-repo-1.hortonworks.com/ambari/centos6/RPM-GPG-KEY/RPM-GPG-KEY-Jenkins
enabled=1
priority=1

/etc/yum.repos.d/HDP.repo

[anup@hdp25-node1 ~]$ cat /etc/yum.repos.d/hdp.repo
cat: /etc/yum.repos.d/hdp.repo: No such file or directory
[anup@hdp25-node1 ~]$ cat /etc/yum.repos.d/HDP.repo
[HDP-2.4]
name=HDP-2.4
baseurl=http://public-repo-1.hortonworks.com/HDP/centos7/2.x/updates/2.4.3.0


path=/
enabled=1
gpgcheck=0

avatar
Master Mentor

@Anup Shirolkar

Good progress can you paste here the screenshot? Now that the first problem was solved I would advice you accept my answer and open a new thread with the YARN UI issue otherwise this thread will be too long to follow.

Thanks

avatar

Yes closing this one is a good idea. I have accepted an answer hope that means the thread is closed.

Thanks

avatar
Master Mentor

Yes open one for the RM/YARN UI people usually ignore a thread that has been there for ages, with long thread.