Member since
04-05-2017
7
Posts
0
Kudos Received
0
Solutions
05-16-2017
07:07 PM
Now that I think more about this, I guess that Unable to connect to: https://hadoop-m:8441/agent/v1/register/hadoop-m.c.hdp-1-163209.internal
started to occur when ambari server started to throw NPEs, which I detected right after (re)starting it, which I did right after I copied HUE to ambari's stack with: sudo git clone https://github.com/EsharEditor/ambari-hue-service.git /var/lib/ambari-server/resources/stacks/HDP/$VERSION/services/HUE (which are the very first steps of HUE installation guide via ambari). Regarding FQDNs, connectivity ambari server <-> ambari agents and handshake/registration ports (8440/8441). @hadoop-m $ more /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain610.132.0.4
hadoop-m.c.hdp-1-163209.internal hadoop-m # Added by Google
169.254.169.254 metadata.google.internal # Added by Google
$ hostname -f
hadoop-m.c.hdp-1-163209.internal
@hadoop-w-0 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
10.132.0.2 hadoop-w-0.c.hdp-1-163209.internal hadoop-w-0 # Added by Google
169.254.169.254 metadata.google.internal # Added by Google $ hostname -f
hadoop-w-0.c.hdp-1-163209.internal $ telnet hadoop-m 8440
Trying 10.132.0.4...
Connected to hadoop-m.
Escape character is '^]'. $ telnet hadoop-m 8441
Trying 10.132.0.4...
Connected to hadoop-m.
Escape character is '^]'.
$ openssl s_client -connect hadoop-m:8440
CONNECTED(00000003)
(... I removed lines ...)
---
Server certificate
-----BEGIN CERTIFICATE-----
MIIFnDCCA4SgAwIBAgIBATAN (...I removed rest...)
$ openssl s_client -connect hadoop-m:8441
CONNECTED(00000003)
(... I removed lines ...)
-----BEGIN CERTIFICATE-----
MIIFnDCCA4SgAwIBAgIBAT (...I removed rest...)
@hadoop-w-1 $ more /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
10.132.0.3 hadoop-w-1.c.hdp-1-163209.internal hadoop-w-1 # Added by Google
169.254.169.254 metadata.google.internal # Added by Google $ hostname -f
hadoop-w-1.c.hdp-1-163209.internal
$ telnet hadoop-m 8440
Trying 10.132.0.4...
Connected to hadoop-m.
Escape character is '^]'. $ telnet hadoop-m 8441
Trying 10.132.0.4...
Connected to hadoop-m.
Escape character is '^]'. $ openssl s_client -connect hadoop-m:8440
CONNECTED(00000003)
(... I removed lines ...)
---
Server certificate
-----BEGIN CERTIFICATE-----
MIIFnDCCA4SgAwIBAgIBATAN (...I removed rest...) $ openssl s_client -connect hadoop-m:8441
CONNECTED(00000003)
(... I removed lines ...)
-----BEGIN CERTIFICATE-----
MIIFnDCCA4SgAwIBAgIBAT (...I removed rest...) It looks as it should, doesn't it?
... View more
05-16-2017
07:06 PM
Btw, for the records. Related to my comment below and your indications. As guessed, no HUE entries are found in ambari db. E.g: ambari=> select * from hostcomponentstate where service_name = 'HUE';
id | cluster_id | component_name | version | current_stack_id | current_state | host_id | service_name | upgrade_state | security_state ----+------------+----------------+---------+------------------+---------------+---------+--------------+---------------+----------------(0 rows)
whereas, there are entries for Zeppeling, e.g.: ambari=> select * from hostcomponentstate where service_name = 'ZEPPELIN';
id | cluster_id | component_name | version | current_stack_id | current_state | host_id | service_name | upgrade_state | security_state -----+------------+-----------------+---------+------------------+---------------+---------+--------------+---------------+---------------- 101 | 2 | ZEPPELIN_MASTER | UNKNOWN | 1 | UNKNOWN | 2 | ZEPPELIN | NONE | UNKNOWN
(1 row)
... View more
05-16-2017
04:36 PM
Thanks for your indications. I'd say that, in that case, it is a communication issue. But I'd definitively have a look.
... View more
05-16-2017
04:32 PM
Thanks for the indications. The problem is that my HUE installation did not come so long. I just had time to conduct: sudo git clone https://github.com/EsharEditor/ambari-hue-service.git /var/lib/ambari-server/resources/stacks/HDP/$VERSION/services/HUE
And then: service ambari-server restart
And the problems started. I could log in to Ambari Web but no heartbeats, obviously not possible to add the HUE service, and basically not possible to do almost anything. As I mentioned in my initial question, before attempting to install HUE via Ambari, I did install Zeppelin, also via Ambari. That installation seemed to go well - ambari-server started well that time and Zeppelin UI works well. Before Zeppelin, I don't recall to have added more services than the ones I chose in the installation of HDP-2.4 on Google Cloud. Hmm.
... View more
05-16-2017
03:30 PM
Hi! -- I have a HDP-2.4 cluster installed on Google Cloud that has been working well until now - a month after first installation. The installation was done following the steps given here. After installation, ambari allowed me to add new services and monitor the cluster. Short after initial installation, I added Zeppelin without problems according to the instructions given here. There was no problem restarting ambari and Zeppelin worked just fine. The issues started this past Friday, right after adding Hue following the instructions given here. After restarting ambari server and agents, the ambari GUI reported "lost heartbeat" everywhere. Several restarts of both ambari server and agent did not help. So I went to the log files. The ambari-server.log is full of this repeating pattern: 15 May 2017 07:16:10,841 INFO [qtp-ambari-client-25] MetricsReportPropertyProvider:153 - METRICS_COLLECTOR is not live. Skip populating resources with metrics, next message will be logged after 1000 attempts.
15 May 2017 07:16:14,000 INFO [qtp-ambari-agent-3887] HeartBeatHandler:927 - agentOsType = centos6
15 May 2017 07:16:14,015 ERROR [qtp-ambari-client-26] ReadHandler:91 - Caught a runtime exception executing a query
java.lang.NullPointerException
15 May 2017 07:16:14,016 WARN [qtp-ambari-client-26] ServletHandler:563 - /api/v1/clusters/hadoop/requests
java.lang.NullPointerException
15 May 2017 07:16:14,067 INFO [qtp-ambari-agent-3887] HostImpl:285 - Received host registration, host=[hostname=hadoop-w-1,fqdn=hadoop-w-1.c.hdp-1-163209.internal,domain=c.hdp-1-163209.internal,architecture=x86_64,processorcount=2,physicalprocessorcount=2,osname=centos,osversion=6.8,osfamily=redhat,memory=7543344,uptime_hours=0,mounts=(available=37036076,mountpoint=/,used=11816500,percent=25%,size=51473368,device=/dev/sda1,type=ext4)(available=3771672,mountpoint=/dev/shm,used=0,percent=0%,size=3771672,device=tmpfs,type=tmpfs)(available=498928440,mountpoint=/mnt/pd1,used=2544104,percent=1%,size=528316088,device=/dev/sdb,type=ext4)]
, registrationTime=1494832574000, agentVersion=2.2.1.0
15 May 2017 07:16:14,076 WARN [qtp-ambari-agent-3887] ServletHandler:563 - /agent/v1/register/hadoop-w-1.c.hdp-1-163209.internal
java.lang.NullPointerException
15 May 2017 07:16:20,054 ERROR [qtp-ambari-client-22] ReadHandler:91 - Caught a runtime exception executing a query
java.lang.NullPointerException
15 May 2017 07:16:20,055 WARN [qtp-ambari-client-22] ServletHandler:563 - /api/v1/clusters/hadoop/requests
java.lang.NullPointerException
15 May 2017 07:16:22,139 INFO [qtp-ambari-agent-3863] HeartBeatHandler:927 - agentOsType = centos6
15 May 2017 07:16:22,156 INFO [qtp-ambari-agent-3863] HostImpl:285 - Received host registration, host=[hostname=hadoop-w-1,fqdn=hadoop-w-1.c.hdp-1-163209.internal,domain=c.hdp-1-163209.internal,architecture=x86_64,processorcount=2,physicalprocessorcount=2,osname=centos,osversion=6.8,osfamily=redhat,memory=7543344,uptime_hours=0,mounts=(available=37036076,mountpoint=/,used=11816500,percent=25%,size=51473368,device=/dev/sda1,type=ext4)(available=3771672,mountpoint=/dev/shm,used=0,percent=0%,size=3771672,device=tmpfs,type=tmpfs)(available=498928440,mountpoint=/mnt/pd1,used=2544104,percent=1%,size=528316088,device=/dev/sdb,type=ext4)]
, registrationTime=1494832582139, agentVersion=2.2.1.0
15 May 2017 07:16:22,159 WARN [qtp-ambari-agent-3863] ServletHandler:563 - /agent/v1/register/hadoop-w-1.c.hdp-1-163209.internal
java.lang.NullPointerException
15 May 2017 07:16:25,161 INFO [qtp-ambari-client-22] MetricsReportPropertyProvider:153 - METRICS_COLLECTOR is not live. Skip populating resources with metrics, next message will be logged after 1000 attempts.
15 May 2017 07:16:25,174 INFO [qtp-ambari-client-21] MetricsReportPropertyProvider:153 - METRICS_COLLECTOR is not live. Skip populating resources with metrics, next message will be logged after 1000 attempts.
15 May 2017 07:16:26,093 ERROR [qtp-ambari-client-22] ReadHandler:91 - Caught a runtime exception executing a query
java.lang.NullPointerException
15 May 2017 07:16:26,094 WARN [qtp-ambari-client-22] ServletHandler:563 - /api/v1/clusters/hadoop/requests
java.lang.NullPointerException
The ambari-server.out is full of: May 15, 2017 7:16:59 AM com.sun.jersey.spi.container.ContainerResponse mapMappableContainerException
SEVERE: The RuntimeException could not be mapped to a response, re-throwing to the HTTP container
java.lang.NullPointerException
The ambari-agent.log is full of: INFO 2017-05-15 07:17:57,014 security.py:99 - SSL Connect being called.. connecting to the server
ERROR 2017-05-15 07:17:57,015 Controller.py:197 - Unable to connect to: https://hadoop-m:8441/agent/v1/register/hadoop-m.c.hdp-1-163209.internal
Traceback (most recent call last):
File "/usr/lib/python2.6/site-packages/ambari_agent/Controller.py", line 150, in registerWithServer
ret = self.sendRequest(self.registerUrl, data)
File "/usr/lib/python2.6/site-packages/ambari_agent/Controller.py", line 425, in sendRequest
raise IOError('Request to {0} failed due to {1}'.format(url, str(exception)))
IOError: Request to https://hadoop-m:8441/agent/v1/register/hadoop-m.c.hdp-1-163209.internal failed due to [Errno 111] Connection refused
ERROR 2017-05-15 07:17:57,015 Controller.py:198 - Error:Request to https://hadoop-m:8441/agent/v1/register/hadoop-m.c.hdp-1-163209.internal failed due to [Errno 111] Connection refused
WARNING 2017-05-15 07:17:57,015 Controller.py:199 - Sleeping for 18 seconds and then trying again
The ambari-agent.out is full of: ERROR 2017-05-15 06:57:24,691 Controller.py:197 - Unable to connect to: https://hadoop-m:8441/agent/v1/register/hadoop-m.c.hdp-1-163209.internal
Traceback (most recent call last):
File "/usr/lib/python2.6/site-packages/ambari_agent/Controller.py", line 150, in registerWithServer
ret = self.sendRequest(self.registerUrl, data)
File "/usr/lib/python2.6/site-packages/ambari_agent/Controller.py", line 428, in sendRequest
+ '; Response: ' + str(response))
IOError: Response parsing failed! Request data: {"hardwareProfile": {"kernel": "Linux", "domain": "c.hdp-1-163209.internal", "physicalprocessorcount": 4, "kernelrelease": "2.6.32-642.15.1.el6.x86_64", "uptime_days": "0", "memorytotal": 15300920, "swapfree": "0.00 GB", "memorysize": 15300920, "osfamily": "redhat", "swapsize": "0.00 GB", "processorcount": 4, "netmask": "255.255.255.255", "timezone": "UTC", "hardwareisa": "x86_64", "memoryfree": 14113788, "operatingsystem": "centos", "kernelmajversion": "2.6", "kernelversion": "2.6.32", "macaddress": "42:01:0A:84:00:04", "operatingsystemrelease": "6.8", "ipaddress": "10.132.0.4", "hostname": "hadoop-m", "uptime_hours": "0", "fqdn": "hadoop-m.c.hdp-1-163209.internal", "id": "root", "architecture": "x86_64", "selinux": false, "mounts": [{"available": "37748016", "used": "11104560", "percent": "23%", "device": "/dev/sda1", "mountpoint": "/", "type": "ext4", "size": "51473368"}, {"available": "7650460", "used": "0", "percent": "0%", "device": "tmpfs", "mountpoint": "/dev/shm", "type": "tmpfs", "size": "7650460"}, {"available": "499130312", "used": "2342232", "percent": "1%", "device": "/dev/sdb", "mountpoint": "/mnt/pd1", "type": "ext4", "size": "528316088"}], "hardwaremodel": "x86_64", "uptime_seconds": "109", "interfaces": "eth0,lo"}, "currentPingPort": 8670, "prefix": "/var/lib/ambari-agent/data", "agentVersion": "2.2.1.0", "agentEnv": {"transparentHugePage": "never", "hostHealth": {"agentTimeStampAtReporting": 1494831444579, "activeJavaProcs": [], "liveServices": [{"status": "Healthy", "name": "ntpd", "desc": ""}]}, "reverseLookup": true, "alternatives": [], "umask": "18", "firewallName": "iptables", "stackFoldersAndFiles": [{"type": "directory", "name": "/etc/hadoop"}, {"type": "directory", "name": "/etc/hbase"}, {"type": "directory", "name": "/etc/hive"}, {"type": "directory", "name": "/etc/ganglia"}, {"type": "directory", "name": "/etc/oozie"}, {"type": "directory", "name": "/etc/sqoop"}, {"type": "directory", "name": "/etc/zookeeper"}, {"type": "directory", "name": "/etc/flume"}, {"type": "directory", "name": "/etc/storm"}, {"type": "directory", "name": "/etc/hive-hcatalog"}, {"type": "directory", "name": "/etc/tez"}, {"type": "directory", "name": "/etc/falcon"}, {"type": "directory", "name": "/etc/hive-webhcat"}, {"type": "directory", "name": "/etc/kafka"}, {"type": "directory", "name": "/etc/slider"}, {"type": "directory", "name": "/etc/storm-slider-client"}, {"type": "directory", "name": "/etc/mahout"}, {"type": "directory", "name": "/etc/spark"}, {"type": "directory", "name": "/etc/pig"}, {"type": "directory", "name": "/etc/accumulo"}, {"type": "directory", "name": "/etc/ambari-metrics-monitor"}, {"type": "directory", "name": "/etc/atlas"}, {"type": "directory", "name": "/var/run/hadoop"}, {"type": "directory", "name": "/var/run/hbase"}, {"type": "directory", "name": "/var/run/hive"}, {"type": "directory", "name": "/var/run/ganglia"}, {"type": "directory", "name": "/var/run/oozie"}, {"type": "directory", "name": "/var/run/sqoop"}, {"type": "directory", "name": "/var/run/zookeeper"}, {"type": "directory", "name": "/var/run/flume"}, {"type": "directory", "name": "/var/run/storm"}, {"type": "directory", "name": "/var/run/hive-hcatalog"}, {"type": "directory", "name": "/var/run/falcon"}, {"type": "directory", "name": "/var/run/webhcat"}, {"type": "directory", "name": "/var/run/hadoop-yarn"}, {"type": "directory", "name": "/var/run/hadoop-mapreduce"}, {"type": "directory", "name": "/var/run/kafka"}, {"type": "directory", "name": "/var/run/spark"}, {"type": "directory", "name": "/var/run/accumulo"}, {"type": "directory", "name": "/var/run/ambari-metrics-monitor"}, {"type": "directory", "name": "/var/run/atlas"}, {"type": "directory", "name": "/var/log/hadoop"}, {"type": "directory", "name": "/var/log/hbase"}, {"type": "directory", "name": "/var/log/hive"}, {"type": "directory", "name": "/var/log/oozie"}, {"type": "directory", "name": "/var/log/sqoop"}, {"type": "directory", "name": "/var/log/zookeeper"}, {"type": "directory", "name": "/var/log/flume"}, {"type": "directory", "name": "/var/log/storm"}, {"type": "directory", "name": "/var/log/hive-hcatalog"}, {"type": "directory", "name": "/var/log/falcon"}, {"type": "directory", "name": "/var/log/hadoop-yarn"}, {"type": "directory", "name": "/var/log/hadoop-mapreduce"}, {"type": "directory", "name": "/var/log/kafka"}, {"type": "directory", "name": "/var/log/spark"}, {"type": "directory", "name": "/var/log/accumulo"}, {"type": "directory", "name": "/var/log/ambari-metrics-monitor"}, {"type": "directory", "name": "/var/log/atlas"}, {"type": "directory", "name": "/usr/lib/flume"}, {"type": "directory", "name": "/usr/lib/storm"}, {"type": "directory", "name": "/var/lib/hive"}, {"type": "directory", "name": "/var/lib/ganglia"}, {"type": "directory", "name": "/var/lib/oozie"}, {"type": "sym_link", "name": "/var/lib/hdfs"}, {"type": "directory", "name": "/var/lib/flume"}, {"type": "directory", "name": "/var/lib/hadoop-hdfs"}, {"type": "directory", "name": "/var/lib/hadoop-yarn"}, {"type": "directory", "name": "/var/lib/hadoop-mapreduce"}, {"type": "directory", "name": "/var/lib/slider"}, {"type": "directory", "name": "/var/lib/ganglia-web"}, {"type": "directory", "name": "/var/lib/spark"}, {"type": "directory", "name": "/var/lib/atlas"}, {"type": "directory", "name": "/hadoop/zookeeper"}, {"type": "directory", "name": "/hadoop/hdfs"}, {"type": "directory", "name": "/hadoop/storm"}, {"type": "directory", "name": "/hadoop/falcon"}, {"type": "directory", "name": "/hadoop/yarn"}, {"type": "directory", "name": "/kafka-logs"}], "existingUsers": [{"status": "Available", "name": "hadoop", "homeDir": "/home/hadoop"}, {"status": "Available", "name": "oozie", "homeDir": "/home/oozie"}, {"status": "Available", "name": "hive", "homeDir": "/home/hive"}, {"status": "Available", "name": "ambari-qa", "homeDir": "/home/ambari-qa"}, {"status": "Available", "name": "flume", "homeDir": "/home/flume"}, {"status": "Available", "name": "hdfs", "homeDir": "/home/hdfs"}, {"status": "Available", "name": "storm", "homeDir": "/home/storm"}, {"status": "Available", "name": "spark", "homeDir": "/home/spark"}, {"status": "Available", "name": "mapred", "homeDir": "/home/mapred"}, {"status": "Available", "name": "accumulo", "homeDir": "/home/accumulo"}, {"status": "Available", "name": "hbase", "homeDir": "/home/hbase"}, {"status": "Available", "name": "tez", "homeDir": "/home/tez"}, {"status": "Available", "name": "zookeeper", "homeDir": "/home/zookeeper"}, {"status": "Available", "name": "mahout", "homeDir": "/home/mahout"}, {"status": "Available", "name": "kafka", "homeDir": "/home/kafka"}, {"status": "Available", "name": "falcon", "homeDir": "/home/falcon"}, {"status": "Available", "name": "sqoop", "homeDir": "/home/sqoop"}, {"status": "Available", "name": "yarn", "homeDir": "/home/yarn"}, {"status": "Available", "name": "hcat", "homeDir": "/home/hcat"}, {"status": "Available", "name": "ams", "homeDir": "/home/ams"}, {"status": "Available", "name": "atlas", "homeDir": "/home/atlas"}], "firewallRunning": false}, "timestamp": 1494831444521, "hostname": "hadoop-m.c.hdp-1-163209.internal", "responseId": -1, "publicHostname": "hadoop-m.c.hdp-1-163209.internal"}; Response: <html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
<title>Error 500 Server Error</title>
</head>
<body>
<h2>HTTP ERROR: 500</h2>
<p>Problem accessing /agent/v1/register/hadoop-m.c.hdp-1-163209.internal. Reason:
<pre> Server Error</pre></p>
<hr /><i><small>Powered by Jetty://</small></i>
p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 11.0px Menlo; color: #000000; background-color: #ffffff}
p.p2 {margin: 0.0px 0.0px 0.0px 0.0px; font: 11.0px Menlo; color: #000000; background-color: #ffffff; min-height: 13.0px}
span.s1 {font-variant-ligatures: no-common-ligatures}
</body>
</html>
ERROR 2017-05-15 06:57:24,694 Controller.py:198 - Error:Response parsing failed! Request data: {"hardwareProfile": {"kernel": "Linux", "domain": "c.hdp-1-163209.internal", "physicalprocessorcount": 4, "kernelrelease": "2.6.32-642.15.1.el6.x86_64", "uptime_days": "0", "memorytotal": 15300920, "swapfree": "0.00 GB", "memorysize": 15300920, "osfamily": "redhat", "swapsize": "0.00 GB", "processorcount": 4, "netmask": "255.255.255.255", "timezone": "UTC", "hardwareisa": "x86_64", "memoryfree": 14113788, "operatingsystem": "centos", "kernelmajversion": "2.6", "kernelversion": "2.6.32", "macaddress": "42:01:0A:84:00:04", "operatingsystemrelease": "6.8", "ipaddress": "10.132.0.4", "hostname": "hadoop-m", "uptime_hours": "0", "fqdn": "hadoop-m.c.hdp-1-163209.internal", "id": "root", "architecture": "x86_64", "selinux": false, "mounts": [{"available": "37748016", "used": "11104560", "percent": "23%", "device": "/dev/sda1", "mountpoint": "/", "type": "ext4", "size": "51473368"}, {"available": "7650460", "used": "0", "percent": "0%", "device": "tmpfs", "mountpoint": "/dev/shm", "type": "tmpfs", "size": "7650460"}, {"available": "499130312", "used": "2342232", "percent": "1%", "device": "/dev/sdb", "mountpoint": "/mnt/pd1", "type": "ext4", "size": "528316088"}], "hardwaremodel": "x86_64", "uptime_seconds": "109", "interfaces": "eth0,lo"}, "currentPingPort": 8670, "prefix": "/var/lib/ambari-agent/data", "agentVersion": "2.2.1.0", "agentEnv": {"transparentHugePage": "never", "hostHealth": {"agentTimeStampAtReporting": 1494831444579, "activeJavaProcs": [], "liveServices": [{"status": "Healthy", "name": "ntpd", "desc": ""}]}, "reverseLookup": true, "alternatives": [], "umask": "18", "firewallName": "iptables", "stackFoldersAndFiles": [{"type": "directory", "name": "/etc/hadoop"}, {"type": "directory", "name": "/etc/hbase"}, {"type": "directory", "name": "/etc/hive"}, {"type": "directory", "name": "/etc/ganglia"}, {"type": "directory", "name": "/etc/oozie"}, {"type": "directory", "name": "/etc/sqoop"}, {"type": "directory", "name": "/etc/zookeeper"}, {"type": "directory", "name": "/etc/flume"}, {"type": "directory", "name": "/etc/storm"}, {"type": "directory", "name": "/etc/hive-hcatalog"}, {"type": "directory", "name": "/etc/tez"}, {"type": "directory", "name": "/etc/falcon"}, {"type": "directory", "name": "/etc/hive-webhcat"}, {"type": "directory", "name": "/etc/kafka"}, {"type": "directory", "name": "/etc/slider"}, {"type": "directory", "name": "/etc/storm-slider-client"}, {"type": "directory", "name": "/etc/mahout"}, {"type": "directory", "name": "/etc/spark"}, {"type": "directory", "name": "/etc/pig"}, {"type": "directory", "name": "/etc/accumulo"}, {"type": "directory", "name": "/etc/ambari-metrics-monitor"}, {"type": "directory", "name": "/etc/atlas"}, {"type": "directory", "name": "/var/run/hadoop"}, {"type": "directory", "name": "/var/run/hbase"}, {"type": "directory", "name": "/var/run/hive"}, {"type": "directory", "name": "/var/run/ganglia"}, {"type": "directory", "name": "/var/run/oozie"}, {"type": "directory", "name": "/var/run/sqoop"}, {"type": "directory", "name": "/var/run/zookeeper"}, {"type": "directory", "name": "/var/run/flume"}, {"type": "directory", "name": "/var/run/storm"}, {"type": "directory", "name": "/var/run/hive-hcatalog"}, {"type": "directory", "name": "/var/run/falcon"}, {"type": "directory", "name": "/var/run/webhcat"}, {"type": "directory", "name": "/var/run/hadoop-yarn"}, {"type": "directory", "name": "/var/run/hadoop-mapreduce"}, {"type": "directory", "name": "/var/run/kafka"}, {"type": "directory", "name": "/var/run/spark"}, {"type": "directory", "name": "/var/run/accumulo"}, {"type": "directory", "name": "/var/run/ambari-metrics-monitor"}, {"type": "directory", "name": "/var/run/atlas"}, {"type": "directory", "name": "/var/log/hadoop"}, {"type": "directory", "name": "/var/log/hbase"}, {"type": "directory", "name": "/var/log/hive"}, {"type": "directory", "name": "/var/log/oozie"}, {"type": "directory", "name": "/var/log/sqoop"}, {"type": "directory", "name": "/var/log/zookeeper"}, {"type": "directory", "name": "/var/log/flume"}, {"type": "directory", "name": "/var/log/storm"}, {"type": "directory", "name": "/var/log/hive-hcatalog"}, {"type": "directory", "name": "/var/log/falcon"}, {"type": "directory", "name": "/var/log/hadoop-yarn"}, {"type": "directory", "name": "/var/log/hadoop-mapreduce"}, {"type": "directory", "name": "/var/log/kafka"}, {"type": "directory", "name": "/var/log/spark"}, {"type": "directory", "name": "/var/log/accumulo"}, {"type": "directory", "name": "/var/log/ambari-metrics-monitor"}, {"type": "directory", "name": "/var/log/atlas"}, {"type": "directory", "name": "/usr/lib/flume"}, {"type": "directory", "name": "/usr/lib/storm"}, {"type": "directory", "name": "/var/lib/hive"}, {"type": "directory", "name": "/var/lib/ganglia"}, {"type": "directory", "name": "/var/lib/oozie"}, {"type": "sym_link", "name": "/var/lib/hdfs"}, {"type": "directory", "name": "/var/lib/flume"}, {"type": "directory", "name": "/var/lib/hadoop-hdfs"}, {"type": "directory", "name": "/var/lib/hadoop-yarn"}, {"type": "directory", "name": "/var/lib/hadoop-mapreduce"}, {"type": "directory", "name": "/var/lib/slider"}, {"type": "directory", "name": "/var/lib/ganglia-web"}, {"type": "directory", "name": "/var/lib/spark"}, {"type": "directory", "name": "/var/lib/atlas"}, {"type": "directory", "name": "/hadoop/zookeeper"}, {"type": "directory", "name": "/hadoop/hdfs"}, {"type": "directory", "name": "/hadoop/storm"}, {"type": "directory", "name": "/hadoop/falcon"}, {"type": "directory", "name": "/hadoop/yarn"}, {"type": "directory", "name": "/kafka-logs"}], "existingUsers": [{"status": "Available", "name": "hadoop", "homeDir": "/home/hadoop"}, {"status": "Available", "name": "oozie", "homeDir": "/home/oozie"}, {"status": "Available", "name": "hive", "homeDir": "/home/hive"}, {"status": "Available", "name": "ambari-qa", "homeDir": "/home/ambari-qa"}, {"status": "Available", "name": "flume", "homeDir": "/home/flume"}, {"status": "Available", "name": "hdfs", "homeDir": "/home/hdfs"}, {"status": "Available", "name": "storm", "homeDir": "/home/storm"}, {"status": "Available", "name": "spark", "homeDir": "/home/spark"}, {"status": "Available", "name": "mapred", "homeDir": "/home/mapred"}, {"status": "Available", "name": "accumulo", "homeDir": "/home/accumulo"}, {"status": "Available", "name": "hbase", "homeDir": "/home/hbase"}, {"status": "Available", "name": "tez", "homeDir": "/home/tez"}, {"status": "Available", "name": "zookeeper", "homeDir": "/home/zookeeper"}, {"status": "Available", "name": "mahout", "homeDir": "/home/mahout"}, {"status": "Available", "name": "kafka", "homeDir": "/home/kafka"}, {"status": "Available", "name": "falcon", "homeDir": "/home/falcon"}, {"status": "Available", "name": "sqoop", "homeDir": "/home/sqoop"}, {"status": "Available", "name": "yarn", "homeDir": "/home/yarn"}, {"status": "Available", "name": "hcat", "homeDir": "/home/hcat"}, {"status": "Available", "name": "ams", "homeDir": "/home/ams"}, {"status": "Available", "name": "atlas", "homeDir": "/home/atlas"}], "firewallRunning": false}, "timestamp": 1494831444521, "hostname": "hadoop-m.c.hdp-1-163209.internal", "responseId": -1, "publicHostname": "hadoop-m.c.hdp-1-163209.internal"}; Response: <html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
<title>Error 500 Server Error</title>
</head>
<body>
<h2>HTTP ERROR: 500</h2>
<p>Problem accessing /agent/v1/register/hadoop-m.c.hdp-1-163209.internal. Reason:
<pre> Server Error</pre></p>
<hr /><i><small>Powered by Jetty://</small></i>
p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 11.0px Menlo; color: #000000; background-color: #ffffff}
p.p2 {margin: 0.0px 0.0px 0.0px 0.0px; font: 11.0px Menlo; color: #000000; background-color: #ffffff; min-height: 13.0px}
span.s1 {font-variant-ligatures: no-common-ligatures}
</body>
</html>
WARNING 2017-05-15 06:57:24,694 Controller.py:199 - Sleeping for 24 seconds and then trying again
WARNING 2017-05-15 06:57:37,965 base_alert.py:417 - [Alert][namenode_rpc_latency] HA nameservice value is present but there are no aliases for {{hdfs-site/dfs.ha.namenodes.{{ha-nameservice}}}}
WARNING 2017-05-15 06:57:37,966 base_alert.py:417 - [Alert][namenode_hdfs_blocks_health] HA nameservice value is present but there are no aliases for {{hdfs-site/dfs.ha.namenodes.{{ha-nameservice}}}}
WARNING 2017-05-15 06:57:37,970 base_alert.py:417 - [Alert][namenode_hdfs_pending_deletion_blocks] HA nameservice value is present but there are no aliases for {{hdfs-site/dfs.ha.namenodes.{{ha-nameservice}}}}
WARNING 2017-05-15 06:57:37,972 base_alert.py:417 - [Alert][namenode_webui] HA nameservice value is present but there are no aliases for {{hdfs-site/dfs.ha.namenodes.{{ha-nameservice}}}}
WARNING 2017-05-15 06:57:37,973 base_alert.py:417 - [Alert][datanode_health_summary] HA nameservice value is present but there are no aliases for {{hdfs-site/dfs.ha.namenodes.{{ha-nameservice}}}}
WARNING 2017-05-15 06:57:37,977 base_alert.py:140 - [Alert][namenode_rpc_latency] Unable to execute alert. [Alert][namenode_rpc_latency] Unable to extract JSON from JMX response
WARNING 2017-05-15 06:57:37,980 base_alert.py:140 - [Alert][namenode_hdfs_blocks_health] Unable to execute alert. [Alert][namenode_hdfs_blocks_health] Unable to extract JSON from JMX response
WARNING 2017-05-15 06:57:37,982 base_alert.py:140 - [Alert][namenode_hdfs_pending_deletion_blocks] Unable to execute alert. [Alert][namenode_hdfs_pending_deletion_blocks] Unable to extract JSON from JMX response
WARNING 2017-05-15 06:57:37,984 base_alert.py:417 - [Alert][namenode_hdfs_capacity_utilization] HA nameservice value is present but there are no aliases for {{hdfs-site/dfs.ha.namenodes.{{ha-nameservice}}}}
WARNING 2017-05-15 06:57:37,986 base_alert.py:140 - [Alert][datanode_health_summary] Unable to execute alert. [Alert][datanode_health_summary] Unable to extract JSON from JMX response
INFO 2017-05-15 06:57:37,986 logger.py:67 - Mount point for directory /hadoop/hdfs/data is /
WARNING 2017-05-15 06:57:37,990 base_alert.py:417 - [Alert][namenode_directory_status] HA nameservice value is present but there are no aliases for {{hdfs-site/dfs.ha.namenodes.{{ha-nameservice}}}}
WARNING 2017-05-15 06:57:38,004 base_alert.py:140 - [Alert][regionservers_health_summary] Unable to execute alert. [Alert][regionservers_health_summary] Unable to extract JSON from JMX response
WARNING 2017-05-15 06:57:38,005 base_alert.py:140 - [Alert][datanode_storage] Unable to execute alert. [Alert][datanode_storage] Unable to extract JSON from JMX response
WARNING 2017-05-15 06:57:38,007 base_alert.py:140 - [Alert][namenode_hdfs_capacity_utilization] Unable to execute alert. [Alert][namenode_hdfs_capacity_utilization] Unable to extract JSON from JMX response
WARNING 2017-05-15 06:57:38,012 base_alert.py:140 - [Alert][namenode_directory_status] Unable to execute alert. [Alert][namenode_directory_status] Unable to extract JSON from JMX response
INFO 2017-05-15 06:57:38,016 logger.py:67 - Pid file /var/run/ambari-metrics-monitor/ambari-metrics-monitor.pid is empty or does not exist
ERROR 2017-05-15 06:57:38,017 script_alert.py:112 - [Alert][yarn_nodemanager_health] Failed with result CRITICAL: ['Connection failed to http://hadoop-m.c.hdp-1-163209.internal:8042/ws/v1/node/info (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 165, in execute\n url_response = urllib2.urlopen(query, timeout=connection_timeout)\n File "/usr/lib64/python2.6/urllib2.py", line 126, in urlopen\n return _opener.open(url, data, timeout)\n File "/usr/lib64/python2.6/urllib2.py", line 391, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.6/urllib2.py", line 409, in _open\n \'_open\', req)\n File "/usr/lib64/python2.6/urllib2.py", line 369, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.6/urllib2.py", line 1190, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.6/urllib2.py", line 1165, in do_open\n raise URLError(err)\nURLError: <urlopen error [Errno 111] Connection refused>\n)']
ERROR 2017-05-15 06:57:38,017 script_alert.py:112 - [Alert][ams_metrics_monitor_process] Failed with result CRITICAL: ['Ambari Monitor is NOT running on hadoop-m.c.hdp-1-163209.internal']
The log files are getting pretty big, specially the ambari-server.log: @hadoop-m ~]$ ls -l /var/log/ambari-server/
total 53912
-rw-r--r--. 1 root root 132186 May 14 04:00 ambari-alerts.log
-rw-r--r--. 1 root root 26258 May 3 13:35 ambari-config-changes.log
-rw-r--r--. 1 root root 38989 May 12 15:46 ambari-eclipselink.log
-rw-r--r--. 1 root root 45011535 May 15 07:17 ambari-server.log
-rw-r--r--. 1 root root 9967341 May 15 07:17 ambari-server.out
The cluster is still in preparation phase and is therefore small, just a master node and two worker nodes. A restart of the virtual machines corresponding to each node did not help either. The ambari GUI with "lost heartbeat" everywhere looks like this e.g. for HDFS: I've tried every single trick that I've found in community posts, although I haven't seen any report (yet) with a problem like mine, i.e., with NPE and heartbeat loss after ambari-server (and agents) restart. Thanks in advance for comments and possible guidance.
... View more
Labels:
04-05-2017
04:33 PM
Images are also broken in this tutorial: https://hortonworks.com/hadoop-tutorial/defining-processing-data-end-end-data-pipeline-apache-falcon/ Very grateful if it could be fixed. Thanks in advance
... View more
04-05-2017
02:37 PM
Hi! -- I am experiencing the same issue with the same tutorial (Falcon) in a Google Cloud based HDP installation. I'd be grateful if you could add some more info regarding that permission issue. Thanks in advance.
... View more