Support Questions
Find answers, ask questions, and share your expertise

HDFS UI and History Server not starting after Automated Install with Ambari 2.1.1.0

HDFS UI and History Server not starting after Automated Install with Ambari 2.1.1.0

I just ran through an Automated Install with Ambari 2.1.1.0 After the install, there were mutiple services that wre not able to start. I was able to start HDFS, but the WebUI doesn't work. Also, I am unable to start the History Server / MapReduce2. Here is the Tail of ambari-agent.log on the SecondaryNameNode ===================================================================================================================================================================== INFO 2016-08-17 12:38:43,495 Heartbeat.py:78 - Building Heartbeat: {responseId = 8714, timestamp = 1471451923495, commandsInProgress = False, componentsMapped = True} INFO 2016-08-17 12:38:43,498 Controller.py:255 - Heartbeat response received (id = 8715) WARNING 2016-08-17 12:38:44,870 base_alert.py:140 - [Alert][hbase_master_cpu] Unable to execute alert. [Alert][hbase_master_cpu] Unable to extract JSON from JMX response WARNING 2016-08-17 12:38:44,873 base_alert.py:140 - [Alert][regionservers_health_summary] Unable to execute alert. [Alert][regionservers_health_summary] Unable to extract JSON from JMX response WARNING 2016-08-17 12:38:44,875 base_alert.py:140 - [Alert][ams_metrics_collector_hbase_master_cpu] Unable to execute alert. [Alert][ams_metrics_collector_hbase_master_cpu] Unable to extract JSON from JMX response INFO 2016-08-17 12:38:44,877 logger.py:67 - Execute['export HIVE_CONF_DIR='/etc/hive/conf.server' ; hive --hiveconf hive.metastore.uris=thrift://hdp-secondary.bateswhite.com:9083 --hiveconf hive.metastore.client.connect.retry.delay=1 --hiveconf hive.metastore.failure.retries=1 --hiveconf hive.metastore.connect.retries=1 --hiveconf hive.metastore.client.socket.timeout=14 --hiveconf hive.execution.engine=mr -e 'show databases;''] {'path': ['/bin/', '/usr/bin/', '/usr/sbin/', '/usr/lib/hive/bin'], 'user': 'ambari-qa', 'timeout': 30} INFO 2016-08-17 12:38:44,887 logger.py:67 - Execute['source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://0.0.0.0:11000/oozie -status'] {'environment': None, 'user': 'oozie'} WARNING 2016-08-17 12:38:44,897 base_alert.py:140 - [Alert][mapreduce_history_server_rpc_latency] Unable to execute alert. [Alert][mapreduce_history_server_rpc_latency] Unable to extract JSON from JMX response WARNING 2016-08-17 12:38:44,900 base_alert.py:140 - [Alert][mapreduce_history_server_cpu] Unable to execute alert. [Alert][mapreduce_history_server_cpu] Unable to extract JSON from JMX response INFO 2016-08-17 12:38:53,498 Heartbeat.py:78 - Building Heartbeat: {responseId = 8715, timestamp = 1471451933498, commandsInProgress = False, componentsMapped = True} INFO 2016-08-17 12:38:53,742 Controller.py:255 - Heartbeat response received (id = 8716) INFO 2016-08-17 12:38:53,743 ActionQueue.py:99 - Adding STATUS_COMMAND for service HDFS of cluster hdp_cluster_01 to the queue. INFO 2016-08-17 12:38:53,743 ActionQueue.py:99 - Adding STATUS_COMMAND for service MAPREDUCE2 of cluster hdp_cluster_01 to the queue. INFO 2016-08-17 12:38:53,743 ActionQueue.py:99 - Adding STATUS_COMMAND for service HIVE of cluster hdp_cluster_01 to the queue. INFO 2016-08-17 12:38:53,743 ActionQueue.py:99 - Adding STATUS_COMMAND for service HBASE of cluster hdp_cluster_01 to the queue. INFO 2016-08-17 12:38:53,743 ActionQueue.py:99 - Adding STATUS_COMMAND for service OOZIE of cluster hdp_cluster_01 to the queue. INFO 2016-08-17 12:38:53,743 ActionQueue.py:99 - Adding STATUS_COMMAND for service ZOOKEEPER of cluster hdp_cluster_01 to the queue. INFO 2016-08-17 12:38:53,743 ActionQueue.py:99 - Adding STATUS_COMMAND for service SPARK of cluster hdp_cluster_01 to the queue. INFO 2016-08-17 12:38:53,743 ActionQueue.py:99 - Adding STATUS_COMMAND for service AMBARI_METRICS of cluster hdp_cluster_01 to the queue. INFO 2016-08-17 12:38:53,743 ActionQueue.py:99 - Adding STATUS_COMMAND for service ACCUMULO of cluster hdp_cluster_01 to the queue. INFO 2016-08-17 12:38:53,743 ActionQueue.py:99 - Adding STATUS_COMMAND for service ATLAS of cluster hdp_cluster_01 to the queue. INFO 2016-08-17 12:38:53,744 ActionQueue.py:99 - Adding STATUS_COMMAND for service ACCUMULO of cluster hdp_cluster_01 to the queue. INFO 2016-08-17 12:38:53,744 ActionQueue.py:99 - Adding STATUS_COMMAND for service ACCUMULO of cluster hdp_cluster_01 to the queue. INFO 2016-08-17 12:38:53,744 ActionQueue.py:99 - Adding STATUS_COMMAND for service HDFS of cluster hdp_cluster_01 to the queue. INFO 2016-08-17 12:38:53,744 ActionQueue.py:99 - Adding STATUS_COMMAND for service MAPREDUCE2 of cluster hdp_cluster_01 to the queue. INFO 2016-08-17 12:38:53,744 ActionQueue.py:99 - Adding STATUS_COMMAND for service YARN of cluster hdp_cluster_01 to the queue. INFO 2016-08-17 12:38:53,744 ActionQueue.py:99 - Adding STATUS_COMMAND for service TEZ of cluster hdp_cluster_01 to the queue. INFO 2016-08-17 12:38:53,744 ActionQueue.py:99 - Adding STATUS_COMMAND for service AMBARI_METRICS of cluster hdp_cluster_01 to the queue. INFO 2016-08-17 12:39:03,745 Heartbeat.py:78 - Building Heartbeat: {responseId = 8716, timestamp = 1471451943744, commandsInProgress = False, componentsMapped = True} INFO 2016-08-17 12:39:03,791 Controller.py:255 - Heartbeat response received (id = 8717) INFO 2016-08-17 12:39:13,792 Heartbeat.py:78 - Building Heartbeat: {responseId = 8717, timestamp = 1471451953792, commandsInProgress = False, componentsMapped = True} INFO 2016-08-17 12:39:13,854 Controller.py:255 - Heartbeat response received (id = 8718) INFO 2016-08-17 12:39:13,854 ClusterConfiguration.py:123 - Updating cached configurations for cluster hdp_cluster_01 INFO 2016-08-17 12:39:13,875 ActionQueue.py:112 - Adding EXECUTION_COMMAND for role MAPREDUCE2_CLIENT for service MAPREDUCE2 of cluster hdp_cluster_01 to the queue. INFO 2016-08-17 12:39:13,890 ActionQueue.py:232 - Executing command with id = 52-0 for role = MAPREDUCE2_CLIENT of cluster hdp_cluster_01. WARNING 2016-08-17 12:39:13,902 CommandStatusDict.py:128 - [Errno 2] No such file or directory: '/var/lib/ambari-agent/data/output-259.txt' INFO 2016-08-17 12:39:13,921 Heartbeat.py:78 - Building Heartbeat: {responseId = 8718, timestamp = 1471451953891, commandsInProgress = True, componentsMapped = True} INFO 2016-08-17 12:39:14,041 Controller.py:255 - Heartbeat response received (id = 8719) INFO 2016-08-17 12:39:15,211 Heartbeat.py:78 - Building Heartbeat: {responseId = 8719, timestamp = 1471451955210, commandsInProgress = True, componentsMapped = True} INFO 2016-08-17 12:39:15,232 Controller.py:255 - Heartbeat response received (id = 8720) INFO 2016-08-17 12:39:25,233 Heartbeat.py:78 - Building Heartbeat: {responseId = 8720, timestamp = 1471451965233, commandsInProgress = False, componentsMapped = True} INFO 2016-08-17 12:39:25,300 Controller.py:255 - Heartbeat response received (id = 8721) INFO 2016-08-17 12:39:25,300 ClusterConfiguration.py:123 - Updating cached configurations for cluster hdp_cluster_01 INFO 2016-08-17 12:39:25,321 ActionQueue.py:112 - Adding EXECUTION_COMMAND for role HISTORYSERVER for service MAPREDUCE2 of cluster hdp_cluster_01 to the queue. INFO 2016-08-17 12:39:25,326 ActionQueue.py:232 - Executing command with id = 52-1 for role = HISTORYSERVER of cluster hdp_cluster_01. WARNING 2016-08-17 12:39:25,335 CommandStatusDict.py:128 - [Errno 2] No such file or directory: '/var/lib/ambari-agent/data/output-260.txt' INFO 2016-08-17 12:39:25,346 Heartbeat.py:78 - Building Heartbeat: {responseId = 8721, timestamp = 1471451965334, commandsInProgress = True, componentsMapped = True} INFO 2016-08-17 12:39:25,362 Controller.py:255 - Heartbeat response received (id = 8722) INFO 2016-08-17 12:39:25,951 PythonExecutor.py:114 - Command ['/usr/bin/python2.7', u'/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/historyserver.py', u'START', '/var/lib/ambari-agent/data/command-260.json', u'/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package', '/var/lib/ambari-agent/data/structured-out-260.json', 'INFO', '/var/lib/ambari-agent/data/tmp'] failed with exitcode=1 INFO 2016-08-17 12:39:25,965 PythonExecutor.py:124 - Command 'ps faux' returned 0. USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 2 0.0 0.0 0 0 ? S Jun23 0:00 [kthreadd] root 3 0.0 0.0 0 0 ? S Jun23 0:00 \_ [ksoftirqd/0] root 7 0.0 0.0 0 0 ? S Jun23 0:00 \_ [migration/0] root 8 0.0 0.0 0 0 ? S Jun23 0:00 \_ [rcu_bh] root 9 0.0 0.0 0 0 ? S Jun23 0:00 \_ [rcuob/0] root 10 0.0 0.0 0 0 ? S Jun23 0:00 \_ [rcuob/1] root 11 0.0 0.0 0 0 ? S Jun23 0:32 \_ [rcu_sched] root 12 0.0 0.0 0 0 ? S Jun23 0:19 \_ [rcuos/0] root 13 0.0 0.0 0 0 ? S Jun23 0:23 \_ [rcuos/1] root 14 0.0 0.0 0 0 ? S Jun23 0:21 \_ [watchdog/0] root 15 0.0 0.0 0 0 ? S Jun23 0:13 \_ [watchdog/1] root 16 0.0 0.0 0 0 ? S Jun23 0:00 \_ [migration/1] root 17 0.0 0.0 0 0 ? S Jun23 0:00 \_ [ksoftirqd/1] root 20 0.0 0.0 0 0 ? S< Jun23 0:00 \_ [khelper] root 21 0.0 0.0 0 0 ? S Jun23 0:00 \_ [kdevtmpfs] root 22 0.0 0.0 0 0 ? S< Jun23 0:00 \_ [netns] root 23 0.0 0.0 0 0 ? S< Jun23 0:00 \_ [perf] root 24 0.0 0.0 0 0 ? S< Jun23 0:00 \_ [writeback] root 25 0.0 0.0 0 0 ? S< Jun23 0:00 \_ [kintegrityd] root 26 0.0 0.0 0 0 ? S< Jun23 0:00 \_ [bioset] root 27 0.0 0.0 0 0 ? S< Jun23 0:00 \_ [kblockd] root 28 0.0 0.0 0 0 ? S< Jun23 0:00 \_ [md] root 33 0.0 0.0 0 0 ? S Jun23 0:01 \_ [khungtaskd] root 34 0.0 0.0 0 0 ? S Jun23 0:00 \_ [kswapd0] root 35 0.0 0.0 0 0 ? SN Jun23 0:00 \_ [ksmd] root 36 0.0 0.0 0 0 ? SN Jun23 0:14 \_ [khugepaged] root 37 0.0 0.0 0 0 ? S Jun23 0:00 \_ [fsnotify_mark] root 38 0.0 0.0 0 0 ? S< Jun23 0:00 \_ [crypto] root 46 0.0 0.0 0 0 ? S< Jun23 0:00 \_ [kthrotld] root 48 0.0 0.0 0 0 ? S< Jun23 0:00 \_ [kmpath_rdacd] root 49 0.0 0.0 0 0 ? S< Jun23 0:00 \_ [kpsmoused] root 51 0.0 0.0 0 0 ? S< Jun23 0:00 \_ [ipv6_addrconf] root 71 0.0 0.0 0 0 ? S< Jun23 0:00 \_ [deferwq] root 101 0.0 0.0 0 0 ? S Jun23 0:00 \_ [kauditd] root 276 0.0 0.0 0 0 ? S< Jun23 0:00 \_ [ata_sff] root 278 0.0 0.0 0 0 ? S Jun23 0:00 \_ [scsi_eh_0] root 279 0.0 0.0 0 0 ? S< Jun23 0:00 \_ [events_power_ef] root 282 0.0 0.0 0 0 ? S< Jun23 0:00 \_ [scsi_tmf_0] root 283 0.0 0.0 0 0 ? S Jun23 0:00 \_ [scsi_eh_1] root 284 0.0 0.0 0 0 ? S< Jun23 0:00 \_ [scsi_tmf_1] root 288 0.0 0.0 0 0 ? S< Jun23 0:00 \_ [ttm_swap] root 295 0.0 0.0 0 0 ? S< Jun23 0:00 \_ [mpt_poll_0] root 296 0.0 0.0 0 0 ? S< Jun23 0:00 \_ [mpt/0] root 302 0.0 0.0 0 0 ? S Jun23 0:00 \_ [scsi_eh_2] root 303 0.0 0.0 0 0 ? S< Jun23 0:00 \_ [scsi_tmf_2] root 371 0.0 0.0 0 0 ? S< Jun23 0:00 \_ [kdmflush] root 382 0.0 0.0 0 0 ? S< Jun23 0:00 \_ [kdmflush] root 383 0.0 0.0 0 0 ? S< Jun23 0:00 \_ [bioset] root 397 0.0 0.0 0 0 ? S< Jun23 0:00 \_ [xfsalloc] root 398 0.0 0.0 0 0 ? S< Jun23 0:00 \_ [xfs_mru_cache] root 399 0.0 0.0 0 0 ? S< Jun23 0:00 \_ [xfs-buf/dm-0] root 400 0.0 0.0 0 0 ? S< Jun23 0:00 \_ [xfs-data/dm-0] root 401 0.0 0.0 0 0 ? S< Jun23 0:00 \_ [xfs-conv/dm-0] root 402 0.0 0.0 0 0 ? S< Jun23 0:00 \_ [xfs-cil/dm-0] root 403 0.0 0.0 0 0 ? S Jun23 0:29 \_ [xfsaild/dm-0] root 565 0.0 0.0 0 0 ? S< Jun23 0:00 \_ [xfs-buf/sda1] root 568 0.0 0.0 0 0 ? S< Jun23 0:00 \_ [xfs-data/sda1] root 569 0.0 0.0 0 0 ? S< Jun23 0:00 \_ [xfs-conv/sda1] root 570 0.0 0.0 0 0 ? S< Jun23 0:00 \_ [xfs-cil/sda1] root 574 0.0 0.0 0 0 ? S Jun23 0:01 \_ [xfsaild/sda1] root 9762 0.0 0.0 0 0 ? S< Jun23 0:00 \_ [bioset] root 23443 0.0 0.0 0 0 ? S< Aug16 0:00 \_ [kworker/0:0H] root 23473 0.0 0.0 0 0 ? S< Aug16 0:00 \_ [kworker/0:1H] root 31521 0.0 0.0 0 0 ? S< Aug16 0:00 \_ [kworker/1:2H] root 3881 0.0 0.0 0 0 ? S 06:41 0:00 \_ [kworker/u4:2] root 5240 0.0 0.0 0 0 ? S 06:47 0:00 \_ [kworker/0:1] root 7196 0.0 0.0 0 0 ? S 11:40 0:00 \_ [kworker/u4:1] root 18308 0.0 0.0 0 0 ? S 12:19 0:00 \_ [kworker/0:2] root 18312 0.0 0.0 0 0 ? S 12:20 0:00 \_ [kworker/1:2] root 20707 0.0 0.0 0 0 ? S 12:30 0:00 \_ [kworker/1:0] root 21649 0.0 0.0 0 0 ? S< 12:34 0:00 \_ [kworker/1:0H] root 21778 0.0 0.0 0 0 ? S 12:34 0:00 \_ [kworker/0:0] root 21946 0.0 0.0 0 0 ? S 12:35 0:00 \_ [kworker/1:1] root 21981 0.0 0.0 0 0 ? S 12:35 0:00 \_ [kworker/u4:0] root 22922 0.0 0.0 0 0 ? S< 12:39 0:00 \_ [kworker/1:1H] root 1 0.0 0.0 43704 6140 ? Ss Jun23 0:52 /usr/lib/systemd/systemd --system --deserialize 20 root 479 0.0 0.2 54336 18712 ? Ss Jun23 0:21 /usr/lib/systemd/systemd-journald root 487 0.0 0.0 129132 5952 ? Ss Jun23 0:00 /usr/sbin/lvmetad -f root 500 0.0 0.0 46400 5112 ? Ss Jun23 0:00 /usr/lib/systemd/systemd-udevd root 599 0.0 0.0 116724 1624 ? S<sl Jun23 0:03 /sbin/auditd -n root 621 0.0 0.2 259072 16888 ? Ss Jun23 46:12 /usr/bin/vmtoolsd root 622 0.0 0.0 19308 1268 ? Ss Jun23 1:59 /usr/sbin/irqbalance --foreground root 625 0.0 0.1 580232 8424 ? Ssl Jun23 0:05 /usr/sbin/NetworkManager --no-daemon root 654 0.0 0.1 110524 15804 ? S Jun23 0:00 \_ /sbin/dhclient -d -q -sf /usr/libexec/nm-dhcp-helper -pf /var/run/dhclient-eno16777984.pid -lf /var/lib/NetworkManager/dhclient-40a415cc-6ae4-d4b8-bd92-937a538a1522-eno16777984.lease -cf /var/lib/NetworkManager/dhclient-eno16777984.conf eno16777984 root 627 0.0 0.1 254652 13176 ? Ssl Jun23 0:02 /usr/sbin/rsyslogd -n root 628 0.0 0.0 26896 2284 ? Ss Jun23 0:20 /usr/lib/systemd/systemd-logind dbus 629 0.0 0.0 100624 1872 ? Ssl Jun23 0:10 /bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation polkitd 651 0.0 0.1 527576 11100 ? Ssl Jun23 0:02 /usr/lib/polkit-1/polkitd --no-debug root 652 0.0 0.0 53064 2652 ? Ss Jun23 0:00 /usr/sbin/wpa_supplicant -u -f /var/log/wpa_supplicant.log -c /etc/wpa_supplicant/wpa_supplicant.conf -u -f /var/log/wpa_supplicant.log -P /var/run/wpa_supplicant.pid root 655 0.0 0.0 110036 844 tty1 Ss+ Jun23 0:00 /sbin/agetty --noclear tty1 linux root 856 0.0 0.0 82560 3608 ? Ss Jun23 0:00 /usr/sbin/sshd -D root 10141 0.0 0.0 140788 5140 ? Ss 11:53 0:00 \_ sshd: root@pts/0 root 10144 0.0 0.0 115384 2032 pts/0 Ss+ 11:53 0:00 \_ -bash root 858 0.0 0.2 553072 16396 ? Ssl Jun23 6:58 /usr/bin/python -Es /usr/sbin/tuned -l -P root 1390 0.0 0.0 91140 2168 ? Ss Jun23 0:16 /usr/libexec/postfix/master -w postfix 1454 0.0 0.0 91312 3904 ? S Jun23 0:02 \_ qmgr -l -t unix -u postfix 6056 0.0 0.0 92104 4084 ? S 11:35 0:00 \_ pickup -l -t unix -u ntp 22481 0.0 0.0 29408 2060 ? Ss Aug15 0:00 /usr/sbin/ntpd -u ntp:ntp -g root 23943 0.0 0.2 228132 18100 ? S Aug16 0:00 /usr/bin/python2.7 /usr/lib/python2.6/site-packages/ambari_agent/AmbariAgent.py start --expected-hostname=hdp-secondary.bateswhite.com root 23951 1.5 0.5 1294548 46384 ? Sl Aug16 21:50 \_ /usr/bin/python2.7 /usr/lib/python2.6/site-packages/ambari_agent/main.py start --expected-hostname=hdp-secondary.bateswhite.com root 22985 0.0 0.0 151168 1892 ? R 12:39 0:00 \_ ps faux mysql 24705 0.0 2.3 1250528 191652 ? Sl Aug16 0:31 /usr/sbin/mysqld --daemonize --pid-file=/var/run/mysqld/mysqld.pid root 26685 0.0 0.0 126328 1684 ? Ss Aug16 0:00 /usr/sbin/crond -n rpc 28598 0.0 0.0 64908 1048 ? Ss Aug16 0:00 /sbin/rpcbind -w atlas 355 0.1 7.1 3642084 571464 ? Sl Aug16 1:31 /usr/jdk64/jdk1.8.0_40/bin/java -Datlas.log.dir=/var/log/atlas -Datlas.log.file=application.log -Datlas.home=/usr/hdp/2.3.6.0-3796/atlas -Datlas.conf=/etc/atlas/conf -Xmx1024m -classpath /etc/atlas/conf:/var/lib/atlas/server/webapp/atlas/WEB-INF/classes:/var/lib/atlas/server/webapp/atlas/WEB-INF/lib/*:/usr/hdp/2.3.6.0-3796/atlas/libext/* org.apache.atlas.Main -app /var/lib/atlas/server/webapp/atlas --port 21000 ams 465 0.4 0.1 582648 13796 ? Sl Aug16 5:46 /usr/bin/python2.7 /usr/lib/python2.6/site-packages/resource_monitoring/main.py start zookeep+ 549 0.0 1.3 3565596 111532 ? Sl Aug16 0:37 /usr/jdk64/jdk1.8.0_40/bin/java -Dzookeeper.log.dir=/var/log/zookeeper -Dzookeeper.log.file=zookeeper-zookeeper-server-hdp-secondary.log -Dzookeeper.root.logger=INFO,ROLLINGFILE -cp /usr/hdp/current/zookeeper-server/bin/../build/classes:/usr/hdp/current/zookeeper-server/bin/../build/lib/*.jar:/usr/hdp/current/zookeeper-server/bin/../lib/xercesMinimal-1.9.6.2.jar:/usr/hdp/current/zookeeper-server/bin/../lib/wagon-provider-api-2.4.jar:/usr/hdp/current/zookeeper-server/bin/../lib/wagon-http-shared4-2.4.jar:/usr/hdp/current/zookeeper-server/bin/../lib/wagon-http-shared-1.0-beta-6.jar:/usr/hdp/current/zookeeper-server/bin/../lib/wagon-http-lightweight-1.0-beta-6.jar:/usr/hdp/current/zookeeper-server/bin/../lib/wagon-http-2.4.jar:/usr/hdp/current/zookeeper-server/bin/../lib/wagon-file-1.0-beta-6.jar:/usr/hdp/current/zookeeper-server/bin/../lib/slf4j-log4j12-1.6.1.jar:/usr/hdp/current/zookeeper-server/bin/../lib/slf4j-api-1.6.1.jar:/usr/hdp/current/zookeeper-server/bin/../lib/plexus-utils-3.0.8.jar:/usr/hdp/current/zookeeper-server/bin/../lib/plexus-interpolation-1.11.jar:/usr/hdp/current/zookeeper-server/bin/../lib/plexus-container-default-1.0-alpha-9-stable-1.jar:/usr/hdp/current/zookeeper-server/bin/../lib/netty-3.7.0.Final.jar:/usr/hdp/current/zookeeper-server/bin/../lib/nekohtml-1.9.6.2.jar:/usr/hdp/current/zookeeper-server/bin/../lib/maven-settings-2.2.1.jar:/usr/hdp/current/zookeeper-server/bin/../lib/maven-repository-metadata-2.2.1.jar:/usr/hdp/current/zookeeper-server/bin/../lib/maven-project-2.2.1.jar:/usr/hdp/current/zookeeper-server/bin/../lib/maven-profile-2.2.1.jar:/usr/hdp/current/zookeeper-server/bin/../lib/maven-plugin-registry-2.2.1.jar:/usr/hdp/current/zookeeper-server/bin/../lib/maven-model-2.2.1.jar:/usr/hdp/current/zookeeper-server/bin/../lib/maven-error-diagnostics-2.2.1.jar:/usr/hdp/current/zookeeper-server/bin/../lib/maven-artifact-manager-2.2.1.jar:/usr/hdp/current/zookeeper-server/bin/../lib/maven-artifact-2.2.1.jar:/usr/hdp/current/zookeeper-server/bin/../lib/maven-ant-tasks-2.1.3.jar:/usr/hdp/current/zookeeper-server/bin/../lib/log4j-1.2.16.jar:/usr/hdp/current/zookeeper-server/bin/../lib/jsoup-1.7.1.jar:/usr/hdp/current/zookeeper-server/bin/../lib/jline-0.9.94.jar:/usr/hdp/current/zookeeper-server/bin/../lib/httpcore-4.2.3.jar:/usr/hdp/current/zookeeper-server/bin/../lib/httpclient-4.2.3.jar:/usr/hdp/current/zookeeper-server/bin/../lib/commons-logging-1.1.1.jar:/usr/hdp/current/zookeeper-server/bin/../lib/commons-io-2.2.jar:/usr/hdp/current/zookeeper-server/bin/../lib/commons-codec-1.6.jar:/usr/hdp/current/zookeeper-server/bin/../lib/classworlds-1.1-alpha-2.jar:/usr/hdp/current/zookeeper-server/bin/../lib/backport-util-concurrent-3.1.jar:/usr/hdp/current/zookeeper-server/bin/../lib/ant-launcher-1.8.0.jar:/usr/hdp/current/zookeeper-server/bin/../lib/ant-1.8.0.jar:/usr/hdp/current/zookeeper-server/bin/../zookeeper-3.4.6.2.3.6.0-3796.jar:/usr/hdp/current/zookeeper-server/bin/../src/java/lib/*.jar:/usr/hdp/current/zookeeper-server/conf::/usr/share/zookeeper/*:/usr/share/zookeeper/* -Xmx1024m -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.local.only=false org.apache.zookeeper.server.quorum.QuorumPeerMain /usr/hdp/current/zookeeper-server/conf/zoo.cfg hdfs 1218 0.0 3.5 2853372 287432 ? Sl Aug16 1:04 /usr/jdk64/jdk1.8.0_40/bin/java -Dproc_secondarynamenode -Xmx1024m -Dhdp.version=2.3.6.0-3796 -Djava.net.preferIPv4Stack=true -Dhdp.version= -Djava.net.preferIPv4Stack=true -Dhdp.version= -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/var/log/hadoop/hdfs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/hdp/2.3.6.0-3796/hadoop -Dhadoop.id.str=hdfs -Dhadoop.root.logger=INFO,console -Djava.library.path=:/usr/hdp/2.3.6.0-3796/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.6.0-3796/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhdp.version=2.3.6.0-3796 -Dhadoop.log.dir=/var/log/hadoop/hdfs -Dhadoop.log.file=hadoop-hdfs-secondarynamenode-hdp-secondary.log -Dhadoop.home.dir=/usr/hdp/2.3.6.0-3796/hadoop -Dhadoop.id.str=hdfs -Dhadoop.root.logger=INFO,RFA -Djava.library.path=:/usr/hdp/2.3.6.0-3796/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.6.0-3796/hadoop/lib/native:/usr/hdp/2.3.6.0-3796/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.6.0-3796/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/hdfs/hs_err_pid%p.log -XX:NewSize=128m -XX:MaxNewSize=128m -Xloggc:/var/log/hadoop/hdfs/gc.log-201608161405 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms1024m -Xmx1024m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT -XX:OnOutOfMemoryError="/usr/hdp/current/hadoop-hdfs-secondarynamenode/bin/kill-secondary-name-node" -server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/hdfs/hs_err_pid%p.log -XX:NewSize=128m -XX:MaxNewSize=128m -Xloggc:/var/log/hadoop/hdfs/gc.log-201608161405 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms1024m -Xmx1024m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT -XX:OnOutOfMemoryError="/usr/hdp/current/hadoop-hdfs-secondarynamenode/bin/kill-secondary-name-node" -server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/hdfs/hs_err_pid%p.log -XX:NewSize=128m -XX:MaxNewSize=128m -Xloggc:/var/log/hadoop/hdfs/gc.log-201608161405 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms1024m -Xmx1024m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT -XX:OnOutOfMemoryError="/usr/hdp/current/hadoop-hdfs-secondarynamenode/bin/kill-secondary-name-node" -Dhadoop.security.logger=INFO,RFAS org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode INFO 2016-08-17 12:39:25,978 PythonExecutor.py:124 - Command 'netstat -tulpn' returned 0. Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 127.0.1.1:50090 0.0.0.0:* LISTEN 1218/java tcp 0 0 0.0.0.0:111 0.0.0.0:* LISTEN 28598/rpcbind tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 856/sshd tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN 1390/master tcp 0 0 0.0.0.0:8670 0.0.0.0:* LISTEN 23951/python2.7 tcp6 0 0 :::21000 :::* LISTEN 355/java tcp6 0 0 :::3306 :::* LISTEN 24705/mysqld tcp6 0 0 :::111 :::* LISTEN 28598/rpcbind tcp6 0 0 :::9200 :::* LISTEN 355/java tcp6 0 0 127.0.1.1:3888 :::* LISTEN 549/java tcp6 0 0 :::22 :::* LISTEN 856/sshd tcp6 0 0 :::52956 :::* LISTEN 549/java tcp6 0 0 :::2181 :::* LISTEN 549/java udp 0 0 0.0.0.0:68 0.0.0.0:* 654/dhclient udp 0 0 0.0.0.0:111 0.0.0.0:* 28598/rpcbind udp 0 0 172.20.7.223:123 0.0.0.0:* 22481/ntpd udp 0 0 127.0.0.1:123 0.0.0.0:* 22481/ntpd udp 0 0 0.0.0.0:123 0.0.0.0:* 22481/ntpd udp 0 0 0.0.0.0:789 0.0.0.0:* 28598/rpcbind udp 0 0 0.0.0.0:9429 0.0.0.0:* 654/dhclient udp6 0 0 :::7617 :::* 654/dhclient udp6 0 0 :::111 :::* 28598/rpcbind udp6 0 0 fe80::250:56ff:fe8b:123 :::* 22481/ntpd udp6 0 0 ::1:123 :::* 22481/ntpd udp6 0 0 :::123 :::* 22481/ntpd udp6 0 0 :::789 :::* 28598/rpcbind INFO 2016-08-17 12:39:25,981 Heartbeat.py:78 - Building Heartbeat: {responseId = 8722, timestamp = 1471451965981, commandsInProgress = True, componentsMapped = True} INFO 2016-08-17 12:39:25,994 Controller.py:255 - Heartbeat response received (id = 8723) ===================================================================================================================================================================== No logs exist in /var/log/hadoop-mapreduce/mapred Any help with this would be appreciated.

7 REPLIES 7

Re: HDFS UI and History Server not starting after Automated Install with Ambari 2.1.1.0

Super Guru

@Clay McDonald - Can you please check namenode logs ( /var/log/hadoop/hdfs/hadoop-hdfs-namenode-blah-blah ) its hard to troubleshoot from ambari-agent.

Re: HDFS UI and History Server not starting after Automated Install with Ambari 2.1.1.0

Now the NameNode service will not even start from Ambari. Here is the NameNode log tail.

2016-08-18 13:48:43,811 WARN namenode.FSNamesystem (FSNamesystem.java:loadFromDisk(690)) - Encountered exception loading fsimage java.io.FileNotFoundException: /hadoop/hdfs/namenode/current/VERSION (Permission denied) at java.io.RandomAccessFile.open0(Native Method) at java.io.RandomAccessFile.open(RandomAccessFile.java:316) at java.io.RandomAccessFile.<init>(RandomAccessFile.java:243) at org.apache.hadoop.hdfs.server.common.StorageInfo.readPropertiesFile(StorageInfo.java:245) at org.apache.hadoop.hdfs.server.namenode.NNStorage.readProperties(NNStorage.java:627) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:339) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:215) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:983) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:688) at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:662) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:726) at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:951) at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:935) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1641) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1707) 2016-08-18 13:48:43,818 INFO mortbay.log (Slf4jLog.java:info(67)) - Stopped HttpServer2$SelectChannelConnectorWithSafeStartup@hdp-master.bateswhite.com:50070 2016-08-18 13:48:43,821 INFO impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(211)) - Stopping NameNode metrics system... 2016-08-18 13:48:43,881 INFO impl.MetricsSinkAdapter (MetricsSinkAdapter.java:publishMetricsFromQueue(141)) - timeline thread interrupted. 2016-08-18 13:48:43,883 INFO impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(217)) - NameNode metrics system stopped. 2016-08-18 13:48:43,884 INFO impl.MetricsSystemImpl (MetricsSystemImpl.java:shutdown(605)) - NameNode metrics system shutdown complete. 2016-08-18 13:48:43,886 ERROR namenode.NameNode (NameNode.java:main(1712)) - Failed to start namenode. java.io.FileNotFoundException: /hadoop/hdfs/namenode/current/VERSION (Permission denied) at java.io.RandomAccessFile.open0(Native Method) at java.io.RandomAccessFile.open(RandomAccessFile.java:316) at java.io.RandomAccessFile.<init>(RandomAccessFile.java:243) at org.apache.hadoop.hdfs.server.common.StorageInfo.readPropertiesFile(StorageInfo.java:245) at org.apache.hadoop.hdfs.server.namenode.NNStorage.readProperties(NNStorage.java:627) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:339) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:215) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:983) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:688) at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:662) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:726) at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:951) at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:935) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1641) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1707) 2016-08-18 13:48:43,888 INFO util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status 1 2016-08-18 13:48:43,890 INFO namenode.NameNode (LogAdapter.java:info(47)) - SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at hdp-master.bateswhite.com/127.0.1.1 ************************************************************/

Re: HDFS UI and History Server not starting after Automated Install with Ambari 2.1.1.0

ok, the hdfs permissions have been corrected. not sure how it was changed in the first place. corrected this with the following;

chown -R hdfs:hdfs /hadoop/hdfs/namenode

Re: HDFS UI and History Server not starting after Automated Install with Ambari 2.1.1.0

I have been attempting to troubleshoot errors after a fresh Ambari install.

When I test that port 50070 is available on the hdp-master, I get the following;

[root@hdp-master ~]# curl -s hdp-master:50070 >/dev/null && echo Connected. || echo Fail.

Connected.

But when I run it from a different server, the commend fails;

[root@hdp-ambari /]# curl -s hdp-master:50070 >/dev/null && echo Connected. || echo Fail.

Fail.

Also, netstat says it is listening.

[root@hdp-master ~]# netstat -a | grep 50070

tcp 0 0 hdp-master:50070 0.0.0.0:* LISTEN

SELinux is Disabled.

[root@hdp-master ~]# getenforce

Disabled

I’ve confirmed that the firewall on the server is disabled.

[root@hdp-master ~]# systemctl status firewalld

â firewalld.service

Loaded: not-found (Reason: No such file or directory)

Active: inactive (dead)

Re: HDFS UI and History Server not starting after Automated Install with Ambari 2.1.1.0

Contributor

@Clay McDonald Can you paste your error ?

Re: HDFS UI and History Server not starting after Automated Install with Ambari 2.1.1.0

Super Guru

@Clay McDonald - There could be firewall running on your hdp-ambari server. Can you please check if you are able to telnet namenode on 50070 from ambari server? If telnet works then your curl should work.

Re: HDFS UI and History Server not starting after Automated Install with Ambari 2.1.1.0

The WebUIs are all responding now and the History Server is running. During the setup, I elected not to update the hosts file because I thought that as long as the nodes were in the DNS and could forward and reverse lookup, that this step was not necessary. However, that assumption was wrong. After I updated the hosts file and rebooted the nodes, everything started working.