Around 10 days back we upgraded our Cloudera manager from Cloudera manager 4.8.5 to Cloudera Express 5.5.1. This was working fine until today and in last few hours we seeing cloudera manager hanging till it is restarted. Our cluster is in CDH4. Also after upgrade of cloudera manager, we haven't restarted the cluster yet, this might not matter.
Has anybody faced similar issue?
Thanks
RK
Here are few lines from CM server log ...
2016-04-04 21:53:37,516 INFO JvmPauseMonitor:com.cloudera.enterprise.debug.JvmPauseMonitor: Detected pause in JVM or host machine (e.g. a stop the world GC, or JVM not scheduled): paused approximately 5312ms: GC pool 'ConcurrentMarkSweep' had collection(s): count=1 time=5401ms
2016-04-04 21:53:44,972 INFO JvmPauseMonitor:com.cloudera.enterprise.debug.JvmPauseMonitor: Detected pause in JVM or host machine (e.g. a stop the world GC, or JVM not scheduled): paused approximately 1102ms: no GCs detected.
2016-04-04 21:53:55,157 INFO JvmPauseMonitor:com.cloudera.enterprise.debug.JvmPauseMonitor: Detected pause in JVM or host machine (e.g. a stop the world GC, or JVM not scheduled): paused approximately 6389ms: GC pool 'ParNew' had collection(s): count=1 time=0ms, GC pool 'ConcurrentMarkSweep' had collection(s): count=1 time=6880ms
2016-04-04 21:54:12,226 INFO JvmPauseMonitor:com.cloudera.enterprise.debug.JvmPauseMonitor: Detected pause in JVM or host machine (e.g. a stop the world GC, or JVM not scheduled): paused approximately 1111ms: no GCs detected.
In GC logs
CMS: abort preclean due to time 2016-04-04T21:43:21.873-0700: 733.159: [CMS-concurrent-abortable-preclean: 2.699/5.030 secs] [Times: user=4.27 sys=0.27, real=5.03 secs]
2016-04-04T21:43:21.874-0700: 733.160: [GC[YG occupancy: 267551 K (629248 K)]2016-04-04T21:43:21.874-0700: 733.160: [Rescan (parallel) , 0.0272730 secs]2016-04-04T21:43:21.902-0700: 733.187: [weak refs processing, 0.0000870 secs]2016-04-04T21:43:21.902-0700: 733.187: [scrub string table, 0.0038900 secs] [1 CMS-remark: 1233151K(1443052K)] 1500703K(2072300K), 0.0313960 secs] [Times: user=0.59 sys=0.00, real=0.04 secs]
2016-04-04T21:46:02.021-0700: 893.307: [GC2016-04-04T21:46:02.021-0700: 893.307: [ParNew: 627919K->68971K(629248K), 0.0194360 secs] 9202459K->8712645K(12826296K), 0.0195760 secs] [Times: user=0.39 sys=0.01, real=0.02 secs]
CMS: abort preclean due to time 2016-04-04T21:46:02.780-0700: 894.066: [CMS-concurrent-abortable-preclean: 3.586/5.010 secs] [Times: user=4.94 sys=0.09, real=5.01 secs]
2016-04-04T21:55:39.345-0700: 1470.631: [Full GC2016-04-04T21:55:39.345-0700: 1470.631: [CMS2016-04-04T21:55:40.705-0700: 1471.991: [CMS-concurrent-mark: 1.359/1.362 secs] [Times: user=1.37 sys=0.00, real=1.36 secs]
(concurrent mode failure): 17825660K->17825627K(17825792K), 7.2681860 secs] 20656828K->20656272K(20656960K), [CMS Perm : 131326K->131326K(221484K)], 7.2683460 secs] [Times: user=7.28 sys=0.00, real=7.27 secs]