Support Questions
Find answers, ask questions, and share your expertise

Hbase master stops running after few seconds.

Explorer

Hbase master is installed in namenode (namenode.in.sop). when I start Hbase master, It get started but after few minutes it gets stopped automatically.

Alert message shown is : connection refused to namenode.in.sop:16000

Hbase logs :

Java HotSpot(TM) 64-Bit Server VM (25.121-b13) for linux-amd64 JRE (1.8.0_121-b13), built on Dec 12 2016 16:36:53 by "java_re" with gcc 4.3.0 20080428 (Red Hat 4.3.0-8)
Memory: 4k page, physical 8011900k(1931220k free), swap 8257532k(8236804k free)
CommandLine flags: -XX:ErrorFile=/var/log/hbase/hs_err_pid%p.log -XX:InitialHeapSize=128190400 -XX:MaxHeapSize=1073741824 -XX:MaxNewSize=87244800 -XX:MaxTenuringThreshold=6 -XX:OldPLABSize=16 -XX:OnOutOfMemoryError=kill -9 %p -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseParNewGC 
2019-01-10T16:27:12.008+0530: 1.191: [GC (Allocation Failure) 2019-01-10T16:27:12.008+0530: 1.191: [ParNew: 33856K->4223K(38080K), 0.0087958 secs] 33856K->5387K(122752K), 0.0089220 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] 
2019-01-10T16:27:12.835+0530: 2.018: [GC (Allocation Failure) 2019-01-10T16:27:12.835+0530: 2.018: [ParNew: 38079K->3389K(38080K), 0.0174462 secs] 39243K->6595K(122752K), 0.0175369 secs] [Times: user=0.02 sys=0.00, real=0.02 secs] 
2019-01-10T16:27:13.207+0530: 2.389: [GC (Allocation Failure) 2019-01-10T16:27:13.207+0530: 2.389: [ParNew: 37245K->4223K(38080K), 0.0094610 secs] 40451K->9480K(122752K), 0.0095239 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] 
2019-01-10T16:27:13.631+0530: 2.813: [GC (Allocation Failure) 2019-01-10T16:27:13.631+0530: 2.813: [ParNew: 38079K->3698K(38080K), 0.0074865 secs] 43336K->9365K(122752K), 0.0075744 secs] [Times: user=0.00 sys=0.00, real=0.01 secs] 
2019-01-10T16:27:13.835+0530: 3.017: [GC (Allocation Failure) 2019-01-10T16:27:13.835+0530: 3.017: [ParNew: 37554K->1240K(38080K), 0.0055719 secs] 43221K->7778K(122752K), 0.0056540 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
2019-01-10T16:27:14.407+0530: 3.589: [GC (Allocation Failure) 2019-01-10T16:27:14.407+0530: 3.589: [ParNew: 35096K->2431K(38080K), 0.0061526 secs] 41634K->8969K(122752K), 0.0062370 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] 
2019-01-10T16:27:14.426+0530: 3.608: [GC (CMS Initial Mark) [1 CMS-initial-mark: 6538K(84672K)] 10971K(122752K), 0.0040071 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
2019-01-10T16:27:14.434+0530: 3.616: [CMS-concurrent-mark-start]
2019-01-10T16:27:14.468+0530: 3.650: [CMS-concurrent-mark: 0.034/0.034 secs] [Times: user=0.03 sys=0.00, real=0.03 secs] 
2019-01-10T16:27:14.474+0530: 3.656: [CMS-concurrent-preclean-start]
2019-01-10T16:27:14.475+0530: 3.657: [CMS-concurrent-preclean: 0.001/0.001 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
2019-01-10T16:27:14.475+0530: 3.657: [CMS-concurrent-abortable-preclean-start]
2019-01-10T16:27:14.787+0530: 3.970: [GC (Allocation Failure) 2019-01-10T16:27:14.787+0530: 3.970: [ParNew: 36287K->2316K(38080K), 0.0095541 secs] 42825K->10118K(122752K), 0.0096321 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] 
2019-01-10T16:27:15.104+0530: 4.286: [CMS-concurrent-abortable-preclean: 0.229/0.629 secs] [Times: user=0.59 sys=0.01, real=0.63 secs] 
2019-01-10T16:27:15.115+0530: 4.298: [GC (CMS Final Remark) [YG occupancy: 28884 K (38080 K)]2019-01-10T16:27:15.115+0530: 4.298: [Rescan (parallel) , 0.0077814 secs]2019-01-10T16:27:15.123+0530: 4.305: [weak refs processing, 0.0000331 secs]2019-01-10T16:27:15.123+0530: 4.305: [class unloading, 0.0030827 secs]2019-01-10T16:27:15.126+0530: 4.309: [scrub symbol table, 0.0027604 secs]2019-01-10T16:27:15.129+0530: 4.311: [scrub string table, 0.0004649 secs][1 CMS-remark: 7801K(84672K)] 36685K(122752K), 0.0145743 secs] [Times: user=0.02 sys=0.00, real=0.01 secs] 
2019-01-10T16:27:15.141+0530: 4.323: [CMS-concurrent-sweep-start]
2019-01-10T16:27:15.144+0530: 4.326: [CMS-concurrent-sweep: 0.002/0.002 secs] [Times: user=0.00 sys=0.00, real=0.01 secs] 
2019-01-10T16:27:15.144+0530: 4.326: [CMS-concurrent-reset-start]
2019-01-10T16:27:15.147+0530: 4.330: [CMS-concurrent-reset: 0.004/0.004 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
2019-01-10T16:27:15.155+0530: 4.337: [GC (Allocation Failure) 2019-01-10T16:27:15.155+0530: 4.337: [ParNew: 36172K->2520K(38080K), 0.0038867 secs] 41896K->8243K(122752K), 0.0039568 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
2019-01-10T16:27:15.278+0530: 4.460: [GC (Allocation Failure) 2019-01-10T16:27:15.278+0530: 4.460: [ParNew: 36376K->1955K(38080K), 0.0046129 secs] 42099K->8052K(122752K), 0.0046859 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
2019-01-10T16:27:15.503+0530: 4.686: [GC (Allocation Failure) 2019-01-10T16:27:15.503+0530: 4.686: [ParNew: 35811K->2243K(38080K), 0.0041595 secs] 41908K->8340K(122752K), 0.0042305 secs] [Times: user=0.01 sys=0.00, real=0.00 secs] 
2019-01-10T16:27:15.849+0530: 5.032: [GC (Allocation Failure) 2019-01-10T16:27:15.849+0530: 5.032: [ParNew: 36099K->2268K(38080K), 0.0057452 secs] 42196K->8366K(122752K), 0.0058221 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] 
2019-01-10T16:27:16.155+0530: 5.337: [GC (Allocation Failure) 2019-01-10T16:27:16.155+0530: 5.337: [ParNew: 36124K->2377K(38080K), 0.0069522 secs] 42222K->8832K(122752K), 0.0070305 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
2019-01-10T16:27:17.130+0530: 6.312: [GC (Allocation Failure) 2019-01-10T16:27:17.130+0530: 6.312: [ParNew: 36233K->4224K(38080K), 0.0082606 secs] 42688K->10740K(122752K), 0.0083634 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] 
2019-01-10T16:27:17.265+0530: 6.447: [GC (Allocation Failure) 2019-01-10T16:27:17.265+0530: 6.447: [ParNew: 38080K->2058K(38080K), 0.0092968 secs] 44596K->10896K(122752K), 0.0093922 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] 
2019-01-10T16:27:17.769+0530: 6.951: [GC (Allocation Failure) 2019-01-10T16:27:17.769+0530: 6.951: [ParNew: 35914K->2100K(38080K), 0.0057748 secs] 44752K->10938K(122752K), 0.0058689 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] 
2019-01-10T16:27:18.092+0530: 7.274: [GC (Allocation Failure) 2019-01-10T16:27:18.092+0530: 7.275: [ParNew: 35956K->3100K(38080K), 0.0056527 secs] 44794K->11938K(122752K), 0.0057448 secs] [Times: user=0.00 sys=0.00, real=0.01 secs] 
2019-01-10T16:27:18.933+0530: 8.115: [GC (Allocation Failure) 2019-01-10T16:27:18.933+0530: 8.115: [ParNew: 36956K->2513K(38080K), 0.0082556 secs] 45794K->11973K(122752K), 0.0083492 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] 
2019-01-10T16:27:19.466+0530: 8.648: [GC (Allocation Failure) 2019-01-10T16:27:19.466+0530: 8.649: [ParNew: 36369K->2498K(38080K), 0.0092841 secs] 45829K->12789K(122752K), 0.0093821 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] 

I am also getting GC allocation Failure in namenode logs

Namenode logs:

2018-09-13T17:25:13.341+0530: 163.098: [GC (Allocation Failure) 2018-09-13T17:25:13.341+0530: 163.098: [ParNew: 176060K->5173K(184320K), 0.0493439 secs] 180804K->13604K(1028096K), 0.0494549 secs] [Times: user=0.05 sys=0.00, real=0.05 secs] 
2018-09-13T17:25:15.043+0530: 164.800: [GC (Allocation Failure) 2018-09-13T17:25:15.043+0530: 164.800: [ParNew: 169013K->8331K(184320K), 0.1074649 secs] 177444K->16761K(1028096K), 0.1075756 secs] [Times: user=0.07 sys=0.00, real=0.10 secs] 
2018-09-13T17:25:15.850+0530: 165.607: [GC (Allocation Failure) 2018-09-13T17:25:15.850+0530: 165.607: [ParNew: 172171K->5445K(184320K), 0.1568449 secs] 180601K->13876K(1028096K), 0.1569780 secs] [Times: user=0.06 sys=0.00, real=0.16 secs] 
2018-09-13T17:25:16.709+0530: 166.467: [GC (Allocation Failure) 2018-09-13T17:25:16.709+0530: 166.467: [ParNew: 169285K->5954K(184320K), 0.0760959 secs] 177716K->14385K(1028096K), 0.0762063 secs] [Times: user=0.07 sys=0.00, real=0.08 secs] 
2018-09-13T17:25:26.130+0530: 175.888: [GC (Allocation Failure) 2018-09-13T17:25:26.130+0530: 175.888: [ParNew: 169794K->8792K(184320K), 0.0456534 secs] 178225K->17223K(1028096K), 0.0457743 secs] [Times: user=0.05 sys=0.00, real=0.05 secs] 
2018-09-13T17:30:54.415+0530: 504.173: [GC (Allocation Failure) 2018-09-13T17:30:54.415+0530: 504.173: [ParNew: 172632K->5359K(184320K), 0.0423570 secs] 181063K->13789K(1028096K), 0.0425261 secs] [Times: user=0.04 sys=0.00, real=0.04 secs] 
2018-09-13T17:58:33.507+0530: 2163.265: [GC (Allocation Failure) 2018-09-13T17:58:33.507+0530: 2163.265: [ParNew: 169199K->3440K(184320K), 0.0451056 secs] 177629K->13056K(1028096K), 0.0452152 secs] [Times: user=0.05 sys=0.00, real=0.05 secs] 
2018-09-13T18:21:41.935+0530: 3551.693: [GC (Allocation Failure) 2018-09-13T18:21:41.935+0530: 3551.693: [ParNew: 167280K->4107K(184320K), 0.0469423 secs] 176896K->14342K(1028096K), 0.0470430 secs] [Times: user=0.04 sys=0.00, real=0.05 secs] 
2018-09-13T19:01:27.263+0530: 5937.020: [GC (Allocation Failure) 2018-09-13T19:01:27.263+0530: 5937.021: [ParNew: 167947K->1769K(184320K), 0.0432569 secs] 178182K->12011K(1028096K), 0.0433850 secs] [Times: user=0.05 sys=0.00, real=0.04 secs] 
2018-09-13T19:43:59.792+0530: 8489.549: [GC (Allocation Failure) 2018-09-13T19:43:59.792+0530: 8489.549: [ParNew: 165609K->1652K(184320K), 0.0448316 secs] 175851K->11903K(1028096K), 0.0449217 secs] [Times: user=0.05 sys=0.00, real=0.04 secs] 
2 REPLIES 2

Super Collaborator

Hi @Shobhna Dhami,

Can you please check any OutOfMemoryError errors in the log file. please check /var/log/messages log file also. If it is OOM you need to upgrade the memory in that server


Explorer

Hi@subhash parise

Thanks for the reply.

I have checked ambari-server.log and ambari-agent.log file . Both the file does not show out of memory error.

/var/log/messages logs

systemd: Starting Session 7799 of user root. Jan 10 17:00:01 slnxhadoop01 systemd: Started Session 7799 of user root. Jan 10 17:01:01 slnxhadoop01 systemd: Starting Session 7800 of user root. Jan 10 17:01:01 slnxhadoop01 systemd: Started Session 7800 of user root. Jan 10 17:29:39 slnxhadoop01 NetworkManager[673]: <error> [1547121579.803564] [devices/nm-device.c:2303] activation_source_schedule(): (eno16777736): activation stage already scheduled Jan 10 17:32:42 slnxhadoop01 NetworkManager[673]: <error> [1547121762.303526] [devices/nm-device.c:2303] activation_source_schedule(): (eno16777736): activation stage already scheduled

; ;