Member since
03-20-2018
11
Posts
1
Kudos Received
0
Solutions
10-24-2018
11:34 AM
Hi @vpoornalingam I am also facing similar issue. I am getting error p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 11.0px Menlo; color: #000000; background-color: #ffffff}
span.s1 {font-variant-ligatures: no-common-ligatures} [root@ip-172-31-47-215 hbase]# cat hbase-root-master-ip-172-31-47-215.us-west-2.compute.internal.out Error: Could not find or load main class exists If i look into the log file i am just getting following message. There is no error report there p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 11.0px Menlo; color: #000000; background-color: #ffffff}
span.s1 {font-variant-ligatures: no-common-ligatures} Wed Oct 24 08:51:53 UTC 2018 Starting master on ip-172-31-47-215.us-west-2.compute.internal core file size(blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 63362 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files(-n) 1024 pipe size(512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority(-r) 0 stack size(kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes(-u) 63362 virtual memory(kbytes, -v) unlimited file locks(-x) unlimited Below is the output of the gc log [root@ip-172-31-47-215 hbase]# cat gc.log-201810240851 Java HotSpot(TM) 64-Bit Server VM (25.181-b13) for linux-amd64 JRE (1.8.0_181-b13), built on Jul7 2018 00:56:38 by "java_re" with gcc 4.3.0 20080428 (Red Hat 4.3.0-8) Memory: 4k page, physical 16265880k(388408k free), swap 0k(0k free) CommandLine flags: -XX:ErrorFile=/var/log/hbase/hs_err_pid%p.log -XX:InitialHeapSize=260254080 -XX:MaxHeapSize=1073741824 -XX:MaxNewSize=348966912 -XX:MaxTenuringThreshold=6 -XX:OldPLABSize=16 -XX:OnOutOfMemoryError=kill -9 %p -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseParNewGC Heap par new generation total 76800K, used 10973K [0x00000000c0000000, 0x00000000c5350000, 0x00000000d4cc0000) eden space 68288K,16% used [0x00000000c0000000, 0x00000000c0ab7608, 0x00000000c42b0000) from space 8512K, 0% used [0x00000000c42b0000, 0x00000000c42b0000, 0x00000000c4b00000) to space 8512K, 0% used [0x00000000c4b00000, 0x00000000c4b00000, 0x00000000c5350000) concurrent mark-sweep generation total 170688K, used 0K [0x00000000d4cc0000, 0x00000000df370000, 0x0000000100000000) Metaspace used 2984K, capacity 4480K, committed 4480K, reserved 1056768K class spaceused 313K, capacity 384K, committed 384K, reserved 1048576K Any idea how to fix this ?
... View more
07-17-2018
02:44 AM
Hi @Vinicius Higa Murakami, No Actually i have created new cluster thats why you are seeing two different hostnames. Below are host file entries 127.0.0.1 localhost.localdomain localhost ::1 localhost6.localdomain6 localhost6 10.162.96.45 temp.tem1.org [root@jazz1 ~]# cat /etc/sysconfig/network # Created by cloud-init on instance boot automatically, do not edit. # NETWORKING=yes hostname=temp.tem1.org [root@jazz1 ~]# hostname --fqdn temp.tem1.org Above are all the required information. However, i am still getting below error and also Nodemanager is going down again NodeManager WEbUI
Connection failed to http://temp.tem1.org:8042 (<urlopen error [Errno 111] Connection refused>)
NodeManager Health Connection failed to http://temp.tem1.org:8042/ws/v1/node/info (Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 171, in execute
url_response = urllib2.urlopen(query, timeout=connection_timeout)
File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib64/python2.7/urllib2.py", line 431, in open
response = self._open(req, data)
File "/usr/lib64/python2.7/urllib2.py", line 449, in _open
'_open', req)
File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain
result = func(*args)
File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open
raise URLError(err)
URLError: <urlopen error [Errno 111] Connection refused>
)
... View more
07-16-2018
12:08 AM
Also i am still getting the same alert on Ambari Connection failed to http://temp.tem1.org:8042/ws/v1/node/info (Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 171, in execute
url_response = urllib2.urlopen(query, timeout=connection_timeout)
File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib64/python2.7/urllib2.py", line 431, in open
response = self._open(req, data)
File "/usr/lib64/python2.7/urllib2.py", line 449, in _open
'_open', req)
File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain
result = func(*args)
File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open
raise URLError(err)
URLError: <urlopen error [Errno 111] Connection refused>
)
... View more
07-15-2018
06:27 PM
yes e.g. 2018-07-15 17:52:52,377 WARNnodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:launchContainer(237)) - Exit code from container container_e01_1531677114201_0002_01_000001 is : 143 2018-07-15 18:22:17,588 WARNlauncher.RecoveredContainerLaunch (RecoveredContainerLaunch.java:call(113)) - Recovered container exited with a non-zero exit code 154 2018-07-15 18:22:19,193 WARNlogaggregation.LogAggregationService (LogAggregationService.java:verifyAndCreateRemoteLogDir(230)) - Remote Root Log Dir [/app-logs] already exist, but with incorrect permissions. Expected: [rwxrwxrwt], Found: [rwxrwxrwx]. The cluster may have problems with multiple users. 2018-07-15 17:51:39,597 ERROR nodemanager.NodeManager (LogAdapter.java:error(69)) - RECEIVED SIGNAL 15: SIGTERM 2018-07-15 18:22:17,587 ERROR launcher.RecoveredContainerLaunch (RecoveredContainerLaunch.java:call(98)) - Unable to recover container container_e01_1531677114201_0001_01_000001
... View more
07-15-2018
06:29 AM
Thanks Vinicius, I found that nodemanager was not running on Datanodes so i have manually start that using below command /usr/hdp/current/hadoop-yarn-nodemanager/sbin/yarn-daemon.sh start nodemanager However, it is still getting down again and again Though there are container related error/warning but still not able to crack the root cause. Regards, Laeeq -
... View more
07-14-2018
11:23 AM
Just found that all NodeManagers are down again. Can anyone please provide the fix ?
... View more
07-14-2018
11:20 AM
1 Kudo
Hello, I have setup a 6 node cluster (2M , 3 D and 1E) node. Cluster has been setup smoothly without any issue. However, i can see NodeManager getting down on Ambari. From ResourceManager UI, I can see 1 node was active and other 2 nodemanager were down. I restarted YARN services after executing below statements on all nodes rm -rf * (from /var/log/hadoop-yarn/nodemanager/recovery-state directory). Found this solution on some forum After restarting YARN i can see all NodeManagers UP in resourcemanager but Ambari still showing Down alerts. Below are alerts gernerated Connection failed to http://ip-172-31-32-138.us-west-2.compute.internal:8042/ws/v1/node/info (Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 171, in execute
url_response = urllib2.urlopen(query, timeout=connection_timeout)
File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib64/python2.7/urllib2.py", line 431, in open
response = self._open(req, data)
File "/usr/lib64/python2.7/urllib2.py", line 449, in _open
'_open', req)
File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain
result = func(*args)
File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open
raise URLError(err)
URLError: <urlopen error [Errno 111] Connection refused>
)
Can anyone let me know how to get rid of this alert/error on Ambari. Thanks and Regards, Laeeq -
... View more
Labels:
- Labels:
-
Apache Ambari
05-24-2018
08:30 AM
Hello, Still getting below error. Still not able to start Metron Indexing. p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 11.0px Menlo; color: #000000; background-color: #ffffff}
span.s1 {font-variant-ligatures: no-common-ligatures} Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/ambari_agent/PythonReflectiveExecutor.py", line 59, in run_file imp.load_source('__main__', script) File "/var/lib/ambari-agent/cache/common-services/METRON/0.4.1.1.4.2.0/package/scripts/indexing_master.py", line 18, in <module> import requests File "/usr/lib/python2.7/site-packages/requests/__init__.py", line 53, in <module> from .packages.urllib3.contrib import pyopenssl File "/usr/lib/python2.7/site-packages/requests/packages/__init__.py", line 3, in <module> from . import urllib3 File "/usr/lib/python2.7/site-packages/requests/packages/__init__.py", line 61, in load_module AttributeError: 'NoneType' object has no attribute 'modules'
... View more