Support Questions
Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Innovation Accelerator group hub.

Ambari-metrics not displaying metrics

Explorer

Hello,

I have tried tailoring/fine tuning my HBASE heap size and Amabari heap sizes for a week which includes starting/stopping services to rebooting the nodes themselves. The recommendations from Ambari are stuck in a infinite loop meaning as soon as you change it to the suggested value it then has you change it again to something else and it goes on... I have green checks on every service in my component section. The logistics of my installation are as follows:

Ambari version 2.2.0

IBM IOP for Big Insights 4.2.0 (BI is not installed yet I am just configuring IBM IOP). If you wish to know the painful details of my configuration and installation I followed the steps here: https://www.ibm.com/support/knowledgecenter/en/SSPT3X_4.2.0/com.ibm.swg.im.infosphere.biginsights.we...

Here is a 100 line tail post from every log below:

--------------------------------------------------

HBASE AMS-MASTER log:

at java.lang.Thread.run(Thread.java:745)

2017-07-31 09:13:00,810 INFO[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:61181] server.NIOServerCnxn: Closed socket connection for client /10.199.101.165:58578 (no session established for client)

2017-07-31 09:14:00,811 INFO[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:61181] server.NIOServerCnxnFactory: Accepted socket connection from /10.199.101.165:58619

2017-07-31 09:14:00,811 WARN[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:61181] server.NIOServerCnxn: caught end of stream exception

EndOfStreamException: Unable to read additional data from client sessionid 0x0, likely client has closed socket

at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)

at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)

at java.lang.Thread.run(Thread.java:745)

2017-07-31 09:14:00,811 INFO[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:61181] server.NIOServerCnxn: Closed socket connection for client /10.199.101.165:58619 (no session established for client)

2017-07-31 09:14:13,936 INFO[LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=1.13 GB, freeSize=61.06 MB, max=1.19 GB, blockCount=60930, accesses=63980267, hits=44268111, hitRatio=69.19%, , cachingAccesses=63689585, cachingHits=44206670, cachingHitsRatio=69.41%, evictions=8727, evicted=19219544, evictedPerRun=2202.308349609375

2017-07-31 09:15:00,810 INFO[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:61181] server.NIOServerCnxnFactory: Accepted socket connection from /10.199.101.165:58663

2017-07-31 09:15:00,810 WARN[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:61181] server.NIOServerCnxn: caught end of stream exception

EndOfStreamException: Unable to read additional data from client sessionid 0x0, likely client has closed socket

at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)

at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)

at java.lang.Thread.run(Thread.java:745)

2017-07-31 09:15:00,810 INFO[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:61181] server.NIOServerCnxn: Closed socket connection for client /10.199.101.165:58663 (no session established for client)

2017-07-31 09:15:34,712 WARN[B.defaultRpcServer.handler=25,queue=1,port=48111] ipc.RpcServer: (responseTooSlow): {"call":"Scan(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ScanRequest)","starttimems":1501506924619,"responsesize":9,"method":"Scan","processingtimems":10093,"client":"10.199.101.165:38854","queuetimems":6,"class":"HRegionServer"}

2017-07-31 09:15:35,074 WARN[B.defaultRpcServer.handler=8,queue=2,port=48111] ipc.RpcServer: (responseTooSlow): {"call":"Scan(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ScanRequest)","starttimems":1501506924605,"responsesize":9,"method":"Scan","processingtimems":10469,"client":"10.199.101.165:38854","queuetimems":0,"class":"HRegionServer"}

2017-07-31 09:15:35,130 WARN[B.defaultRpcServer.handler=26,queue=2,port=48111] ipc.RpcServer: (responseTooSlow): {"call":"Scan(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ScanRequest)","starttimems":1501506924944,"responsesize":1837,"method":"Scan","processingtimems":10186,"client":"10.199.101.165:38854","queuetimems":0,"class":"HRegionServer"}

2017-07-31 09:15:35,905 WARN[B.defaultRpcServer.handler=3,queue=0,port=48111] ipc.RpcServer: (responseTooSlow): {"call":"Scan(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ScanRequest)","starttimems":1501506924679,"responsesize":9,"method":"Scan","processingtimems":11226,"client":"10.199.101.165:38854","queuetimems":0,"class":"HRegionServer"}

2017-07-31 09:15:36,760 WARN[B.defaultRpcServer.handler=10,queue=1,port=48111] ipc.RpcServer: (responseTooSlow): {"call":"Scan(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ScanRequest)","starttimems":1501506926453,"responsesize":1817,"method":"Scan","processingtimems":10307,"client":"10.199.101.165:38854","queuetimems":1509,"class":"HRegionServer"}

2017-07-31 09:15:36,763 WARN[B.defaultRpcServer.handler=0,queue=0,port=48111] ipc.RpcServer: (responseTooSlow): {"call":"Scan(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ScanRequest)","starttimems":1501506924607,"responsesize":9,"method":"Scan","processingtimems":12156,"client":"10.199.101.165:38854","queuetimems":0,"class":"HRegionServer"}

2017-07-31 09:15:36,903 WARN[B.defaultRpcServer.handler=9,queue=0,port=48111] ipc.RpcServer: (responseTooSlow): {"call":"Scan(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ScanRequest)","starttimems":1501506926454,"responsesize":2021,"method":"Scan","processingtimems":10449,"client":"10.199.101.165:38854","queuetimems":1510,"class":"HRegionServer"}

2017-07-31 09:15:36,908 WARN[B.defaultRpcServer.handler=18,queue=0,port=48111] ipc.RpcServer: (responseTooSlow): {"call":"Scan(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ScanRequest)","starttimems":1501506924606,"responsesize":9,"method":"Scan","processingtimems":12302,"client":"10.199.101.165:38854","queuetimems":0,"class":"HRegionServer"}

2017-07-31 09:15:37,306 WARN[B.defaultRpcServer.handler=22,queue=1,port=48111] ipc.RpcServer: (responseTooSlow): {"call":"Scan(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ScanRequest)","starttimems":1501506924605,"responsesize":9,"method":"Scan","processingtimems":12701,"client":"10.199.101.165:38854","queuetimems":0,"class":"HRegionServer"}

2017-07-31 09:15:37,364 WARN[B.defaultRpcServer.handler=4,queue=1,port=48111] ipc.RpcServer: (responseTooSlow): {"call":"Scan(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ScanRequest)","starttimems":1501506924610,"responsesize":9,"method":"Scan","processingtimems":12754,"client":"10.199.101.165:38854","queuetimems":0,"class":"HRegionServer"}

2017-07-31 09:15:37,389 WARN[B.defaultRpcServer.handler=13,queue=1,port=48111] ipc.RpcServer: (responseTooSlow): {"call":"Scan(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ScanRequest)","starttimems":1501506924684,"responsesize":9,"method":"Scan","processingtimems":12705,"client":"10.199.101.165:38854","queuetimems":5,"class":"HRegionServer"}

2017-07-31 09:15:38,202 WARN[B.defaultRpcServer.handler=7,queue=1,port=48111] ipc.RpcServer: (responseTooSlow): {"call":"Scan(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ScanRequest)","starttimems":1501506924759,"responsesize":9,"method":"Scan","processingtimems":13443,"client":"10.199.101.165:38854","queuetimems":24,"class":"HRegionServer"}

2017-07-31 09:15:38,280 WARN[B.defaultRpcServer.handler=29,queue=2,port=48111] ipc.RpcServer: (responseTooSlow): {"call":"Scan(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ScanRequest)","starttimems":1501506924748,"responsesize":9,"method":"Scan","processingtimems":13532,"client":"10.199.101.165:38854","queuetimems":13,"class":"HRegionServer"}

2017-07-31 09:15:38,418 WARN[B.defaultRpcServer.handler=15,queue=0,port=48111] ipc.RpcServer: (responseTooSlow): {"call":"Scan(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ScanRequest)","starttimems":1501506924718,"responsesize":9,"method":"Scan","processingtimems":13700,"client":"10.199.101.165:38854","queuetimems":0,"class":"HRegionServer"}

2017-07-31 09:15:38,516 WARN[B.defaultRpcServer.handler=16,queue=1,port=48111] ipc.RpcServer: (responseTooSlow): {"call":"Scan(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ScanRequest)","starttimems":1501506924726,"responsesize":9,"method":"Scan","processingtimems":13790,"client":"10.199.101.165:38854","queuetimems":8,"class":"HRegionServer"}

2017-07-31 09:15:38,812 WARN[B.defaultRpcServer.handler=27,queue=0,port=48111] ipc.RpcServer: (responseTooSlow): {"call":"Scan(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ScanRequest)","starttimems":1501506924735,"responsesize":9,"method":"Scan","processingtimems":14077,"client":"10.199.101.165:38854","queuetimems":0,"class":"HRegionServer"}

2017-07-31 09:15:38,828 WARN[B.defaultRpcServer.handler=12,queue=0,port=48111] ipc.RpcServer: (responseTooSlow): {"call":"Scan(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ScanRequest)","starttimems":1501506924718,"responsesize":9,"method":"Scan","processingtimems":14109,"client":"10.199.101.165:38854","queuetimems":0,"class":"HRegionServer"}

2017-07-31 09:15:38,848 WARN[B.defaultRpcServer.handler=28,queue=1,port=48111] ipc.RpcServer: (responseTooSlow): {"call":"Scan(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ScanRequest)","starttimems":1501506924735,"responsesize":9,"method":"Scan","processingtimems":14113,"client":"10.199.101.165:38854","queuetimems":0,"class":"HRegionServer"}

2017-07-31 09:15:56,880 WARN[B.defaultRpcServer.handler=19,queue=1,port=48111] ipc.RpcServer: (responseTooSlow): {"call":"Scan(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ScanRequest)","starttimems":1501506924746,"responsesize":9,"method":"Scan","processingtimems":32134,"client":"10.199.101.165:38854","queuetimems":11,"class":"HRegionServer"}

2017-07-31 09:16:00,810 INFO[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:61181] server.NIOServerCnxnFactory: Accepted socket connection from /10.199.101.165:58708

2017-07-31 09:16:00,811 WARN[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:61181] server.NIOServerCnxn: caught end of stream exception

EndOfStreamException: Unable to read additional data from client sessionid 0x0, likely client has closed socket

at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)

at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)

at java.lang.Thread.run(Thread.java:745)

2017-07-31 09:16:00,811 INFO[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:61181] server.NIOServerCnxn: Closed socket connection for client /10.199.101.165:58708 (no session established for client)

2017-07-31 09:16:04,007 INFO[bi-node-3.novalocal,48111,1501475653425_ChoreService_1] regionserver.HRegionServer: bi-node-3.novalocal,48111,1501475653425-MemstoreFlusherChore requesting flush for region METRIC_AGGREGATE_MINUTE,,1501256756592.e9f205c605adc76c13ee54a798ae913c. after a delay of 15186

2017-07-31 09:16:14,007 INFO[bi-node-3.novalocal,48111,1501475653425_ChoreService_1] regionserver.HRegionServer: bi-node-3.novalocal,48111,1501475653425-MemstoreFlusherChore requesting flush for region METRIC_AGGREGATE_MINUTE,,1501256756592.e9f205c605adc76c13ee54a798ae913c. after a delay of 5510

2017-07-31 09:16:17,792 INFO[regionserver/bi-node-3.novalocal/10.199.101.165:0.logRoller] wal.FSHLog: Rolled WAL /var/lib/ambari-metrics-collector/hbase/WALs/bi-node-3.novalocal,48111,1501475653425/bi-node-3.novalocal%2C48111%2C1501475653425.default.1501506612651 with entries=653, filesize=30.41 MB; new WAL /var/lib/ambari-metrics-collector/hbase/WALs/bi-node-3.novalocal,48111,1501475653425/bi-node-3.novalocal%2C48111%2C1501475653425.default.1501506977790

2017-07-31 09:16:17,792 INFO[regionserver/bi-node-3.novalocal/10.199.101.165:0.logRoller] wal.FSHLog: Archiving file:/var/lib/ambari-metrics-collector/hbase/WALs/bi-node-3.novalocal,48111,1501475653425/bi-node-3.novalocal%2C48111%2C1501475653425.default.1501502699232 to file:/var/lib/ambari-metrics-collector/hbase/oldWALs/bi-node-3.novalocal%2C48111%2C1501475653425.default.1501502699232

2017-07-31 09:16:19,193 INFO[MemStoreFlusher.1] regionserver.HRegion: Started memstore flush for METRIC_AGGREGATE_MINUTE,,1501256756592.e9f205c605adc76c13ee54a798ae913c., current region memstore size 27.59 MB, and 1/1 column families' memstores are being flushed.

2017-07-31 09:16:19,368 INFO[MemStoreFlusher.1] regionserver.DefaultStoreFlusher: Flushed, sequenceid=1291, memsize=27.6 M, hasBloomFilter=true, into tmp file file:/var/lib/ambari-metrics-collector/hbase/data/default/METRIC_AGGREGATE_MINUTE/e9f205c605adc76c13ee54a798ae913c/.tmp/9592433d33dc43d98e3418827794434e

2017-07-31 09:16:19,370 INFO[MemStoreFlusher.1] regionserver.HStore: Added file:/var/lib/ambari-metrics-collector/hbase/data/default/METRIC_AGGREGATE_MINUTE/e9f205c605adc76c13ee54a798ae913c/0/9592433d33dc43d98e3418827794434e, entries=127380, sequenceid=1291, filesize=808.8 K

2017-07-31 09:16:19,371 INFO[MemStoreFlusher.1] regionserver.HRegion: Finished memstore flush of ~27.59 MB/28934616, currentsize=0 B/0 for region METRIC_AGGREGATE_MINUTE,,1501256756592.e9f205c605adc76c13ee54a798ae913c. in 178ms, sequenceid=1291, compaction requested=true

2017-07-31 09:16:19,372 INFO[RS:0;bi-node-3:48111-shortCompactions-1501475659923] regionserver.HRegion: Starting compaction on 0 in region METRIC_AGGREGATE_MINUTE,,1501256756592.e9f205c605adc76c13ee54a798ae913c.

2017-07-31 09:16:19,372 INFO[RS:0;bi-node-3:48111-shortCompactions-1501475659923] regionserver.HStore: Starting compaction of 3 file(s) in 0 of METRIC_AGGREGATE_MINUTE,,1501256756592.e9f205c605adc76c13ee54a798ae913c. into tmpdir=file:/var/lib/ambari-metrics-collector/hbase/data/default/METRIC_AGGREGATE_MINUTE/e9f205c605adc76c13ee54a798ae913c/.tmp, totalSize=33.9 M

2017-07-31 09:16:19,375 INFO[RS:0;bi-node-3:48111-shortCompactions-1501475659923] zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x201f3c9b connecting to ZooKeeper ensemble=localhost:61181

2017-07-31 09:16:19,375 INFO[RS:0;bi-node-3:48111-shortCompactions-1501475659923] zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:61181 sessionTimeout=120000 watcher=hconnection-0x201f3c9b0x0, quorum=localhost:61181, baseZNode=/hbase

2017-07-31 09:16:19,376 INFO[RS:0;bi-node-3:48111-shortCompactions-1501475659923-SendThread(localhost:61181)] zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:61181. Will not attempt to authenticate using SASL (unknown error)

2017-07-31 09:16:19,377 INFO[RS:0;bi-node-3:48111-shortCompactions-1501475659923-SendThread(localhost:61181)] zookeeper.ClientCnxn: Socket connection established to localhost/127.0.0.1:61181, initiating session

2017-07-31 09:16:19,377 INFO[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:61181] server.NIOServerCnxnFactory: Accepted socket connection from /127.0.0.1:39928

2017-07-31 09:16:19,378 INFO[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:61181] server.ZooKeeperServer: Client attempting to establish new session at /127.0.0.1:39928

2017-07-31 09:16:19,380 INFO[SyncThread:0] server.ZooKeeperServer: Established session 0x15d96ec4153004e with negotiated timeout 40000 for client /127.0.0.1:39928

2017-07-31 09:16:19,381 INFO[RS:0;bi-node-3:48111-shortCompactions-1501475659923-SendThread(localhost:61181)] zookeeper.ClientCnxn: Session establishment complete on server localhost/127.0.0.1:61181, sessionid = 0x15d96ec4153004e, negotiated timeout = 40000

2017-07-31 09:16:19,394 INFO[RS:0;bi-node-3:48111-shortCompactions-1501475659923] hfile.CacheConfig: blockCache=LruBlockCache{blockCount=76488, currentSize=1216418144, freeSize=64030880, maxSize=1280449024, heapSize=1216418144, minSize=1216426496, minFactor=0.95, multiSize=608213248, multiFactor=0.5, singleSize=304106624, singleFactor=0.25}, cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, prefetchOnOpen=false

2017-07-31 09:16:25,113 INFO[RS:0;bi-node-3:48111-shortCompactions-1501475659923] client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x15d96ec4153004e

2017-07-31 09:16:25,113 INFO[ProcessThread(sid:0 cport:-1):] server.PrepRequestProcessor: Processed session termination for sessionid: 0x15d96ec4153004e

2017-07-31 09:16:25,114 INFO[RS:0;bi-node-3:48111-shortCompactions-1501475659923] zookeeper.ZooKeeper: Session: 0x15d96ec4153004e closed

2017-07-31 09:16:25,114 INFO[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:61181] server.NIOServerCnxn: Closed socket connection for client /127.0.0.1:39928 which had sessionid 0x15d96ec4153004e

2017-07-31 09:16:25,115 INFO[RS:0;bi-node-3:48111-shortCompactions-1501475659923-EventThread] zookeeper.ClientCnxn: EventThread shut down

2017-07-31 09:16:25,140 INFO[RS:0;bi-node-3:48111-shortCompactions-1501475659923] regionserver.HStore: Completed compaction of 3 (all) file(s) in 0 of METRIC_AGGREGATE_MINUTE,,1501256756592.e9f205c605adc76c13ee54a798ae913c. into 3afa40585eb44dd981ebbe2e99eb9549(size=34.5 M), total size for store is 34.5 M. This selection was in queue for 0sec, and took 5sec to execute.

2017-07-31 09:16:25,140 INFO[RS:0;bi-node-3:48111-shortCompactions-1501475659923] regionserver.CompactSplitThread: Completed compaction: Request = regionName=METRIC_AGGREGATE_MINUTE,,1501256756592.e9f205c605adc76c13ee54a798ae913c., storeName=0, fileCount=3, fileSize=33.9 M, priority=197, time=253688944552319; duration=5sec

2017-07-31 09:17:00,806 INFO[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:61181] server.NIOServerCnxnFactory: Accepted socket connection from /10.199.101.165:58756

2017-07-31 09:17:00,806 WARN[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:61181] server.NIOServerCnxn: caught end of stream exception

EndOfStreamException: Unable to read additional data from client sessionid 0x0, likely client has closed socket

at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)

at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)

at java.lang.Thread.run(Thread.java:745)

2017-07-31 09:17:00,806 INFO[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:61181] server.NIOServerCnxn: Closed socket connection for client /10.199.101.165:58756 (no session established for client)

2017-07-31 09:18:00,807 INFO[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:61181] server.NIOServerCnxnFactory: Accepted socket connection from /10.199.101.165:58809

2017-07-31 09:18:00,808 WARN[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:61181] server.NIOServerCnxn: caught end of stream exception

EndOfStreamException: Unable to read additional data from client sessionid 0x0, likely client has closed socket

at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)

at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)

at java.lang.Thread.run(Thread.java:745)

2017-07-31 09:18:00,808 INFO[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:61181] server.NIOServerCnxn: Closed socket connection for client /10.199.101.165:58809 (no session established for client)

2017-07-31 09:18:48,360 INFO[MemStoreFlusher.1] regionserver.HRegion: Started memstore flush for METRIC_RECORD,mapred.ShuffleMetrics.ShuffleOutputsFailed\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00,1501256749795.b9cbc3e103887b70bb6c604d3ee5cefb., current region memstore size 128.01 MB, and 1/1 column families' memstores are being flushed.

2017-07-31 09:18:49,057 INFO[MemStoreFlusher.1] regionserver.DefaultStoreFlusher: Flushed, sequenceid=658027, memsize=128.0 M, hasBloomFilter=true, into tmp file file:/var/lib/ambari-metrics-collector/hbase/data/default/METRIC_RECORD/b9cbc3e103887b70bb6c604d3ee5cefb/.tmp/c2772d249b584e2fb541e09a0505e2d8

2017-07-31 09:18:49,060 INFO[MemStoreFlusher.1] regionserver.HStore: Added file:/var/lib/ambari-metrics-collector/hbase/data/default/METRIC_RECORD/b9cbc3e103887b70bb6c604d3ee5cefb/0/c2772d249b584e2fb541e09a0505e2d8, entries=520096, sequenceid=658027, filesize=7.7 M

2017-07-31 09:18:49,060 INFO[MemStoreFlusher.1] regionserver.HRegion: Finished memstore flush of ~128.01 MB/134230944, currentsize=0 B/0 for region METRIC_RECORD,mapred.ShuffleMetrics.ShuffleOutputsFailed\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00,1501256749795.b9cbc3e103887b70bb6c604d3ee5cefb. in 700ms, sequenceid=658027, compaction requested=true

2017-07-31 09:18:49,061 INFO[RS:0;bi-node-3:48111-shortCompactions-1501475659813] regionserver.HRegion: Starting compaction on 0 in region METRIC_RECORD,mapred.ShuffleMetrics.ShuffleOutputsFailed\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00,1501256749795.b9cbc3e103887b70bb6c604d3ee5cefb.

2017-07-31 09:18:49,061 INFO[RS:0;bi-node-3:48111-shortCompactions-1501475659813] regionserver.HStore: Starting compaction of 3 file(s) in 0 of METRIC_RECORD,mapred.ShuffleMetrics.ShuffleOutputsFailed\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00,1501256749795.b9cbc3e103887b70bb6c604d3ee5cefb. into tmpdir=file:/var/lib/ambari-metrics-collector/hbase/data/default/METRIC_RECORD/b9cbc3e103887b70bb6c604d3ee5cefb/.tmp, totalSize=107.7 M

2017-07-31 09:18:49,063 INFO[RS:0;bi-node-3:48111-shortCompactions-1501475659813] hfile.CacheConfig: blockCache=LruBlockCache{blockCount=60623, currentSize=1216414808, freeSize=64034216, maxSize=1280449024, heapSize=1216414808, minSize=1216426496, minFactor=0.95, multiSize=608213248, multiFactor=0.5, singleSize=304106624, singleFactor=0.25}, cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, prefetchOnOpen=false

2017-07-31 09:18:58,459 INFO[RS:0;bi-node-3:48111-shortCompactions-1501475659813] regionserver.HStore: Completed compaction of 3 file(s) in 0 of METRIC_RECORD,mapred.ShuffleMetrics.ShuffleOutputsFailed\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00,1501256749795.b9cbc3e103887b70bb6c604d3ee5cefb. into 4796a7f10d954ef5ae8ce9c461eb7b43(size=107.4 M), total size for store is 374.1 M. This selection was in queue for 0sec, and took 9sec to execute.

2017-07-31 09:18:58,460 INFO[RS:0;bi-node-3:48111-shortCompactions-1501475659813] regionserver.CompactSplitThread: Completed compaction: Request = regionName=METRIC_RECORD,mapred.ShuffleMetrics.ShuffleOutputsFailed\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00,1501256749795.b9cbc3e103887b70bb6c604d3ee5cefb., storeName=0, fileCount=3, fileSize=107.7 M, priority=196, time=253838633242200; duration=9sec

2017-07-31 09:19:00,823 INFO[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:61181] server.NIOServerCnxnFactory: Accepted socket connection from /10.199.101.165:58850

2017-07-31 09:19:00,824 WARN[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:61181] server.NIOServerCnxn: caught end of stream exception

EndOfStreamException: Unable to read additional data from client sessionid 0x0, likely client has closed socket

at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)

at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)

at java.lang.Thread.run(Thread.java:745)

2017-07-31 09:19:00,825 INFO[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:61181] server.NIOServerCnxn: Closed socket connection for client /10.199.101.165:58850 (no session established for client)

2017-07-31 09:19:13,936 INFO[LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=921.10 MB, freeSize=300.03 MB, max=1.19 GB, blockCount=48130, accesses=64583340, hits=44659230, hitRatio=69.15%, , cachingAccesses=64271828, cachingHits=44585297, cachingHitsRatio=69.37%, evictions=8816, evicted=19435128, evictedPerRun=2204.529052734375

--------------------------------------------------

GC Log:

2017-07-31T09:21:24.946-0400: 31633.729: [GC (Allocation Failure) 2017-07-31T09:21:24.946-0400: 31633.729: [ParNew: 235968K->26176K(235968K), 0.0140092 secs] 2460709K->2260655K(4168128K), 0.0141841 secs] [Times: user=0.07 sys=0.00, real=0.02 secs]

2017-07-31T09:21:24.991-0400: 31633.773: [GC (Allocation Failure) 2017-07-31T09:21:24.991-0400: 31633.773: [ParNew: 235968K->26176K(235968K), 0.0093508 secs] 2470447K->2271424K(4168128K), 0.0095246 secs] [Times: user=0.04 sys=0.00, real=0.01 secs]

2017-07-31T09:21:25.032-0400: 31633.815: [GC (Allocation Failure) 2017-07-31T09:21:25.032-0400: 31633.815: [ParNew: 235968K->26176K(235968K), 0.0154670 secs] 2481216K->2282807K(4168128K), 0.0156377 secs] [Times: user=0.05 sys=0.00, real=0.01 secs]

2017-07-31T09:21:25.082-0400: 31633.864: [GC (Allocation Failure) 2017-07-31T09:21:25.082-0400: 31633.864: [ParNew: 235968K->26176K(235968K), 0.0121024 secs] 2492599K->2293580K(4168128K), 0.0122901 secs] [Times: user=0.04 sys=0.00, real=0.01 secs]

2017-07-31T09:21:25.124-0400: 31633.906: [GC (Allocation Failure) 2017-07-31T09:21:25.124-0400: 31633.906: [ParNew: 235968K->26176K(235968K), 0.0101913 secs] 2503372K->2304845K(4168128K), 0.0103633 secs] [Times: user=0.04 sys=0.00, real=0.01 secs]

2017-07-31T09:21:25.168-0400: 31633.951: [GC (Allocation Failure) 2017-07-31T09:21:25.168-0400: 31633.951: [ParNew: 235968K->26176K(235968K), 0.0105573 secs] 2514637K->2314690K(4168128K), 0.0108753 secs] [Times: user=0.07 sys=0.01, real=0.01 secs]

2017-07-31T09:21:25.212-0400: 31633.995: [GC (Allocation Failure) 2017-07-31T09:21:25.212-0400: 31633.995: [ParNew: 235968K->26176K(235968K), 0.0118154 secs] 2524482K->2324806K(4168128K), 0.0119979 secs] [Times: user=0.05 sys=0.00, real=0.01 secs]

2017-07-31T09:21:25.257-0400: 31634.040: [GC (Allocation Failure) 2017-07-31T09:21:25.257-0400: 31634.040: [ParNew: 235968K->26176K(235968K), 0.0117306 secs] 2534598K->2336029K(4168128K), 0.0119117 secs] [Times: user=0.05 sys=0.01, real=0.01 secs]

2017-07-31T09:21:25.303-0400: 31634.086: [GC (Allocation Failure) 2017-07-31T09:21:25.304-0400: 31634.086: [ParNew: 235968K->26176K(235968K), 0.0112086 secs] 2545821K->2346745K(4168128K), 0.0113835 secs] [Times: user=0.04 sys=0.00, real=0.01 secs]

2017-07-31T09:21:25.346-0400: 31634.129: [GC (Allocation Failure) 2017-07-31T09:21:25.346-0400: 31634.129: [ParNew: 235968K->26176K(235968K), 0.0073820 secs] 2556537K->2356128K(4168128K), 0.0075561 secs] [Times: user=0.05 sys=0.00, real=0.01 secs]

2017-07-31T09:21:25.397-0400: 31634.180: [GC (Allocation Failure) 2017-07-31T09:21:25.397-0400: 31634.180: [ParNew: 235968K->26176K(235968K), 0.0078254 secs] 2565920K->2366001K(4168128K), 0.0080379 secs] [Times: user=0.05 sys=0.00, real=0.01 secs]

2017-07-31T09:21:25.434-0400: 31634.216: [GC (Allocation Failure) 2017-07-31T09:21:25.434-0400: 31634.216: [ParNew: 235968K->26176K(235968K), 0.0080872 secs] 2575793K->2376675K(4168128K), 0.0082744 secs] [Times: user=0.05 sys=0.01, real=0.01 secs]

2017-07-31T09:21:25.473-0400: 31634.256: [GC (Allocation Failure) 2017-07-31T09:21:25.473-0400: 31634.256: [ParNew: 235968K->26176K(235968K), 0.0091703 secs] 2586467K->2387265K(4168128K), 0.0093446 secs] [Times: user=0.05 sys=0.00, real=0.01 secs]

2017-07-31T09:21:25.519-0400: 31634.301: [GC (Allocation Failure) 2017-07-31T09:21:25.519-0400: 31634.301: [ParNew: 235968K->26176K(235968K), 0.0083400 secs] 2597057K->2398252K(4168128K), 0.0085299 secs] [Times: user=0.04 sys=0.00, real=0.01 secs]

2017-07-31T09:21:25.557-0400: 31634.340: [GC (Allocation Failure) 2017-07-31T09:21:25.557-0400: 31634.340: [ParNew: 235968K->26176K(235968K), 0.0081181 secs] 2608044K->2407906K(4168128K), 0.0083129 secs] [Times: user=0.05 sys=0.00, real=0.01 secs]

2017-07-31T09:21:25.597-0400: 31634.379: [GC (Allocation Failure) 2017-07-31T09:21:25.597-0400: 31634.379: [ParNew: 235968K->26176K(235968K), 0.0216765 secs] 2617698K->2417527K(4168128K), 0.0219612 secs] [Times: user=0.06 sys=0.00, real=0.02 secs]

2017-07-31T09:21:25.668-0400: 31634.450: [GC (Allocation Failure) 2017-07-31T09:21:25.668-0400: 31634.450: [ParNew: 235968K->26176K(235968K), 0.0103860 secs] 2627319K->2427370K(4168128K), 0.0106231 secs] [Times: user=0.06 sys=0.00, real=0.01 secs]

2017-07-31T09:21:25.705-0400: 31634.488: [GC (Allocation Failure) 2017-07-31T09:21:25.705-0400: 31634.488: [ParNew: 235968K->26176K(235968K), 0.0077334 secs] 2637162K->2437273K(4168128K), 0.0079185 secs] [Times: user=0.04 sys=0.01, real=0.01 secs]

2017-07-31T09:21:25.743-0400: 31634.525: [GC (Allocation Failure) 2017-07-31T09:21:25.743-0400: 31634.525: [ParNew: 235968K->26176K(235968K), 0.0082894 secs] 2647065K->2447416K(4168128K), 0.0084749 secs] [Times: user=0.05 sys=0.00, real=0.01 secs]

2017-07-31T09:21:25.786-0400: 31634.568: [GC (Allocation Failure) 2017-07-31T09:21:25.786-0400: 31634.568: [ParNew: 235968K->26176K(235968K), 0.0088423 secs] 2657208K->2457283K(4168128K), 0.0090160 secs] [Times: user=0.05 sys=0.00, real=0.01 secs]

2017-07-31T09:21:25.825-0400: 31634.607: [GC (Allocation Failure) 2017-07-31T09:21:25.825-0400: 31634.607: [ParNew: 235968K->26176K(235968K), 0.0078932 secs] 2667075K->2467853K(4168128K), 0.0080690 secs] [Times: user=0.04 sys=0.01, real=0.01 secs]

2017-07-31T09:21:25.863-0400: 31634.645: [GC (Allocation Failure) 2017-07-31T09:21:25.863-0400: 31634.645: [ParNew: 235968K->24451K(235968K), 0.0109223 secs] 2677645K->2476236K(4168128K), 0.0110977 secs] [Times: user=0.04 sys=0.00, real=0.01 secs]

2017-07-31T09:21:25.907-0400: 31634.689: [GC (Allocation Failure) 2017-07-31T09:21:25.907-0400: 31634.689: [ParNew: 234243K->26176K(235968K), 0.0091598 secs] 2686028K->2487792K(4168128K), 0.0093632 secs] [Times: user=0.04 sys=0.00, real=0.01 secs]

2017-07-31T09:21:25.949-0400: 31634.732: [GC (Allocation Failure) 2017-07-31T09:21:25.949-0400: 31634.732: [ParNew: 235968K->26176K(235968K), 0.0083227 secs] 2697584K->2497141K(4168128K), 0.0085077 secs] [Times: user=0.05 sys=0.00, real=0.01 secs]

2017-07-31T09:21:25.992-0400: 31634.775: [GC (Allocation Failure) 2017-07-31T09:21:25.992-0400: 31634.775: [ParNew: 235968K->26176K(235968K), 0.0071822 secs] 2706933K->2506202K(4168128K), 0.0073809 secs] [Times: user=0.05 sys=0.00, real=0.00 secs]

2017-07-31T09:21:26.032-0400: 31634.815: [GC (Allocation Failure) 2017-07-31T09:21:26.033-0400: 31634.815: [ParNew: 235968K->26176K(235968K), 0.0075875 secs] 2715994K->2514960K(4168128K), 0.0077958 secs] [Times: user=0.05 sys=0.00, real=0.01 secs]

2017-07-31T09:21:26.080-0400: 31634.863: [GC (Allocation Failure) 2017-07-31T09:21:26.081-0400: 31634.863: [ParNew: 235968K->26176K(235968K), 0.0104342 secs] 2724752K->2523827K(4168128K), 0.0106189 secs] [Times: user=0.05 sys=0.00, real=0.01 secs]

2017-07-31T09:21:26.127-0400: 31634.909: [GC (Allocation Failure) 2017-07-31T09:21:26.127-0400: 31634.909: [ParNew: 235968K->26176K(235968K), 0.0073320 secs] 2733619K->2532736K(4168128K), 0.0075197 secs] [Times: user=0.04 sys=0.01, real=0.01 secs]

2017-07-31T09:21:26.172-0400: 31634.954: [GC (Allocation Failure) 2017-07-31T09:21:26.172-0400: 31634.954: [ParNew: 235968K->26176K(235968K), 0.0116195 secs] 2742528K->2541665K(4168128K), 0.0119190 secs] [Times: user=0.08 sys=0.00, real=0.01 secs]

2017-07-31T09:21:26.224-0400: 31635.007: [GC (Allocation Failure) 2017-07-31T09:21:26.224-0400: 31635.007: [ParNew: 235968K->26176K(235968K), 0.0070878 secs] 2751457K->2550914K(4168128K), 0.0073116 secs] [Times: user=0.04 sys=0.00, real=0.01 secs]

2017-07-31T09:21:26.275-0400: 31635.058: [GC (Allocation Failure) 2017-07-31T09:21:26.276-0400: 31635.058: [ParNew: 235968K->26176K(235968K), 0.0075735 secs] 2760706K->2560026K(4168128K), 0.0077517 secs] [Times: user=0.05 sys=0.00, real=0.01 secs]

2017-07-31T09:21:26.323-0400: 31635.105: [GC (Allocation Failure) 2017-07-31T09:21:26.323-0400: 31635.105: [ParNew: 235968K->26176K(235968K), 0.0069554 secs] 2769818K->2568871K(4168128K), 0.0071322 secs] [Times: user=0.04 sys=0.00, real=0.00 secs]

2017-07-31T09:21:26.365-0400: 31635.147: [GC (Allocation Failure) 2017-07-31T09:21:26.365-0400: 31635.147: [ParNew: 235968K->26176K(235968K), 0.0076438 secs] 2778663K->2578225K(4168128K), 0.0078294 secs] [Times: user=0.04 sys=0.00, real=0.01 secs]

2017-07-31T09:21:26.416-0400: 31635.198: [GC (Allocation Failure) 2017-07-31T09:21:26.416-0400: 31635.198: [ParNew: 235968K->25265K(235968K), 0.0066747 secs] 2788017K->2585822K(4168128K), 0.0068462 secs] [Times: user=0.05 sys=0.00, real=0.01 secs]

2017-07-31T09:21:26.460-0400: 31635.243: [GC (Allocation Failure) 2017-07-31T09:21:26.461-0400: 31635.243: [ParNew: 235057K->24725K(235968K), 0.0067868 secs] 2795614K->2593910K(4168128K), 0.0069744 secs] [Times: user=0.04 sys=0.00, real=0.00 secs]

2017-07-31T09:21:26.507-0400: 31635.290: [GC (Allocation Failure) 2017-07-31T09:21:26.507-0400: 31635.290: [ParNew: 234517K->23904K(235968K), 0.0063449 secs] 2803702K->2600995K(4168128K), 0.0065467 secs] [Times: user=0.04 sys=0.01, real=0.01 secs]

2017-07-31T09:21:26.558-0400: 31635.340: [GC (Allocation Failure) 2017-07-31T09:21:26.558-0400: 31635.340: [ParNew: 233696K->26145K(235968K), 0.0068133 secs] 2810787K->2611189K(4168128K), 0.0069979 secs] [Times: user=0.05 sys=0.00, real=0.01 secs]

2017-07-31T09:21:26.605-0400: 31635.388: [GC (Allocation Failure) 2017-07-31T09:21:26.606-0400: 31635.388: [ParNew: 235937K->19068K(235968K), 0.0064140 secs] 2820981K->2612259K(4168128K), 0.0065927 secs] [Times: user=0.04 sys=0.00, real=0.01 secs]

2017-07-31T09:21:26.664-0400: 31635.446: [GC (Allocation Failure) 2017-07-31T09:21:26.664-0400: 31635.446: [ParNew: 228860K->23616K(235968K), 0.0097408 secs] 2822051K->2624140K(4168128K), 0.0099143 secs] [Times: user=0.04 sys=0.00, real=0.01 secs]

2017-07-31T09:21:26.723-0400: 31635.506: [GC (Allocation Failure) 2017-07-31T09:21:26.723-0400: 31635.506: [ParNew: 233408K->15772K(235968K), 0.0094609 secs] 2833932K->2622978K(4168128K), 0.0096335 secs] [Times: user=0.03 sys=0.00, real=0.01 secs]

2017-07-31T09:21:26.787-0400: 31635.569: [GC (Allocation Failure) 2017-07-31T09:21:26.787-0400: 31635.569: [ParNew: 225564K->21776K(235968K), 0.0062274 secs] 2832770K->2628982K(4168128K), 0.0064084 secs] [Times: user=0.03 sys=0.00, real=0.01 secs]

2017-07-31T09:21:26.840-0400: 31635.623: [GC (Allocation Failure) 2017-07-31T09:21:26.840-0400: 31635.623: [ParNew: 231568K->22859K(235968K), 0.0059736 secs] 2838774K->2635875K(4168128K), 0.0061426 secs] [Times: user=0.04 sys=0.00, real=0.00 secs]

2017-07-31T09:21:26.900-0400: 31635.682: [GC (Allocation Failure) 2017-07-31T09:21:26.900-0400: 31635.682: [ParNew: 232651K->26165K(235968K), 0.0088571 secs] 2845667K->2644705K(4168128K), 0.0090767 secs] [Times: user=0.06 sys=0.00, real=0.01 secs]

2017-07-31T09:21:26.961-0400: 31635.744: [GC (Allocation Failure) 2017-07-31T09:21:26.961-0400: 31635.744: [ParNew: 235957K->19457K(235968K), 0.0056017 secs] 2854497K->2642918K(4168128K), 0.0057936 secs] [Times: user=0.04 sys=0.00, real=0.00 secs]

2017-07-31T09:21:27.025-0400: 31635.807: [GC (Allocation Failure) 2017-07-31T09:21:27.025-0400: 31635.807: [ParNew: 229249K->21604K(235968K), 0.0081818 secs] 2852710K->2650009K(4168128K), 0.0083806 secs] [Times: user=0.04 sys=0.00, real=0.01 secs]

2017-07-31T09:21:27.087-0400: 31635.870: [GC (Allocation Failure) 2017-07-31T09:21:27.087-0400: 31635.870: [ParNew: 231396K->22503K(235968K), 0.0153401 secs] 2859801K->2655823K(4168128K), 0.0155008 secs] [Times: user=0.05 sys=0.00, real=0.02 secs]

2017-07-31T09:21:27.172-0400: 31635.954: [GC (Allocation Failure) 2017-07-31T09:21:27.172-0400: 31635.954: [ParNew: 232295K->22297K(235968K), 0.0146531 secs] 2865615K->2660416K(4168128K), 0.0149276 secs] [Times: user=0.05 sys=0.00, real=0.01 secs]

2017-07-31T09:21:27.236-0400: 31636.019: [GC (Allocation Failure) 2017-07-31T09:21:27.236-0400: 31636.019: [ParNew: 232089K->22692K(235968K), 0.0118697 secs] 2870208K->2664996K(4168128K), 0.0120675 secs] [Times: user=0.04 sys=0.00, real=0.01 secs]

2017-07-31T09:21:27.309-0400: 31636.091: [GC (Allocation Failure) 2017-07-31T09:21:27.309-0400: 31636.091: [ParNew: 232484K->22239K(235968K), 0.0060900 secs] 2874788K->2669076K(4168128K), 0.0062881 secs] [Times: user=0.04 sys=0.00, real=0.01 secs]

2017-07-31T09:21:27.367-0400: 31636.149: [GC (Allocation Failure) 2017-07-31T09:21:27.367-0400: 31636.150: [ParNew: 232031K->22345K(235968K), 0.0064762 secs] 2878868K->2673922K(4168128K), 0.0066583 secs] [Times: user=0.04 sys=0.00, real=0.01 secs]

2017-07-31T09:21:27.427-0400: 31636.210: [GC (Allocation Failure) 2017-07-31T09:21:27.427-0400: 31636.210: [ParNew: 232137K->18132K(235968K), 0.0055581 secs] 2883714K->2674994K(4168128K), 0.0057488 secs] [Times: user=0.04 sys=0.00, real=0.01 secs]

2017-07-31T09:21:27.490-0400: 31636.272: [GC (Allocation Failure) 2017-07-31T09:21:27.490-0400: 31636.272: [ParNew: 227924K->26176K(235968K), 0.0048982 secs] 2884786K->2683038K(4168128K), 0.0050689 secs] [Times: user=0.04 sys=0.00, real=0.01 secs]

2017-07-31T09:21:27.550-0400: 31636.333: [GC (Allocation Failure) 2017-07-31T09:21:27.550-0400: 31636.333: [ParNew: 235968K->26176K(235968K), 0.0056437 secs] 2892830K->2687135K(4168128K), 0.0058164 secs] [Times: user=0.04 sys=0.00, real=0.00 secs]

2017-07-31T09:21:27.610-0400: 31636.393: [GC (Allocation Failure) 2017-07-31T09:21:27.610-0400: 31636.393: [ParNew: 235968K->21497K(235968K), 0.0066754 secs] 2896927K->2690431K(4168128K), 0.0068501 secs] [Times: user=0.04 sys=0.00, real=0.00 secs]

2017-07-31T09:21:27.667-0400: 31636.450: [GC (Allocation Failure) 2017-07-31T09:21:27.667-0400: 31636.450: [ParNew: 231289K->24763K(235968K), 0.0094092 secs] 2900223K->2697811K(4168128K), 0.0095637 secs] [Times: user=0.03 sys=0.00, real=0.01 secs]

2017-07-31T09:21:27.724-0400: 31636.506: [GC (Allocation Failure) 2017-07-31T09:21:27.724-0400: 31636.506: [ParNew: 234555K->23128K(235968K), 0.0091785 secs] 2907603K->2700950K(4168128K), 0.0093470 secs] [Times: user=0.04 sys=0.00, real=0.01 secs]

2017-07-31T09:21:27.783-0400: 31636.565: [GC (Allocation Failure) 2017-07-31T09:21:27.783-0400: 31636.565: [ParNew: 232837K->26176K(235968K), 0.0164058 secs] 2910660K->2708393K(4168128K), 0.0167509 secs] [Times: user=0.08 sys=0.00, real=0.01 secs]

2017-07-31T09:21:27.865-0400: 31636.647: [GC (Allocation Failure) 2017-07-31T09:21:27.865-0400: 31636.647: [ParNew: 235968K->25913K(235968K), 0.0099162 secs] 2918185K->2713138K(4168128K), 0.0101338 secs] [Times: user=0.05 sys=0.00, real=0.01 secs]

2017-07-31T09:21:27.921-0400: 31636.703: [GC (Allocation Failure) 2017-07-31T09:21:27.921-0400: 31636.703: [ParNew: 235705K->26176K(235968K), 0.0065934 secs] 2922930K->2719037K(4168128K), 0.0067690 secs] [Times: user=0.04 sys=0.00, real=0.00 secs]

2017-07-31T09:21:27.977-0400: 31636.759: [GC (Allocation Failure) 2017-07-31T09:21:27.977-0400: 31636.759: [ParNew: 235968K->26176K(235968K), 0.0058274 secs] 2928829K->2724896K(4168128K), 0.0059977 secs] [Times: user=0.04 sys=0.00, real=0.01 secs]

2017-07-31T09:21:28.042-0400: 31636.824: [GC (Allocation Failure) 2017-07-31T09:21:28.042-0400: 31636.824: [ParNew: 235968K->26076K(235968K), 0.0086151 secs] 2934688K->2730136K(4168128K), 0.0088495 secs] [Times: user=0.06 sys=0.00, real=0.01 secs]

2017-07-31T09:21:28.094-0400: 31636.877: [GC (Allocation Failure) 2017-07-31T09:21:28.095-0400: 31636.877: [ParNew: 235868K->26176K(235968K), 0.0113248 secs] 2939928K->2736036K(4168128K), 0.0114988 secs] [Times: user=0.08 sys=0.00, real=0.01 secs]

2017-07-31T09:21:28.147-0400: 31636.929: [GC (Allocation Failure) 2017-07-31T09:21:28.147-0400: 31636.929: [ParNew: 235968K->19042K(235968K), 0.0078802 secs] 2945828K->2739558K(4168128K), 0.0080535 secs] [Times: user=0.04 sys=0.01, real=0.01 secs]

2017-07-31T09:21:28.193-0400: 31636.975: [GC (Allocation Failure) 2017-07-31T09:21:28.193-0400: 31636.975: [ParNew: 228834K->23589K(235968K), 0.0067221 secs] 2949350K->2751682K(4168128K), 0.0069093 secs] [Times: user=0.04 sys=0.00, real=0.00 secs]

2017-07-31T09:21:28.237-0400: 31637.020: [GC (Allocation Failure) 2017-07-31T09:21:28.237-0400: 31637.020: [ParNew: 233381K->26176K(235968K), 0.0066002 secs] 2961474K->2761945K(4168128K), 0.0067759 secs] [Times: user=0.04 sys=0.00, real=0.01 secs]

2017-07-31T09:21:28.282-0400: 31637.064: [GC (Allocation Failure) 2017-07-31T09:21:28.282-0400: 31637.064: [ParNew: 235968K->23969K(235968K), 0.0110171 secs] 2971737K->2767570K(4168128K), 0.0111952 secs] [Times: user=0.04 sys=0.00, real=0.01 secs]

2017-07-31T09:21:28.326-0400: 31637.108: [GC (Allocation Failure) 2017-07-31T09:21:28.326-0400: 31637.108: [ParNew: 233761K->23073K(235968K), 0.0086335 secs] 2977362K->2774879K(4168128K), 0.0088013 secs] [Times: user=0.04 sys=0.00, real=0.01 secs]

2017-07-31T09:21:28.370-0400: 31637.152: [GC (Allocation Failure) 2017-07-31T09:21:28.370-0400: 31637.152: [ParNew: 232865K->26176K(235968K), 0.0111989 secs] 2984671K->2787102K(4168128K), 0.0113699 secs] [Times: user=0.04 sys=0.00, real=0.02 secs]

2017-07-31T09:21:28.394-0400: 31637.177: [GC (CMS Initial Mark) [1 CMS-initial-mark: 2760926K(3932160K)] 2863852K(4168128K), 0.0030493 secs] [Times: user=0.02 sys=0.00, real=0.00 secs]

2017-07-31T09:21:28.398-0400: 31637.180: [CMS-concurrent-mark-start]

2017-07-31T09:21:28.429-0400: 31637.211: [GC (Allocation Failure) 2017-07-31T09:21:28.429-0400: 31637.211: [ParNew: 235968K->26176K(235968K), 0.0123220 secs] 2996894K->2795701K(4168128K), 0.0125051 secs] [Times: user=0.04 sys=0.00, real=0.02 secs]

2017-07-31T09:21:28.484-0400: 31637.267: [GC (Allocation Failure) 2017-07-31T09:21:28.485-0400: 31637.267: [ParNew: 235968K->26176K(235968K), 0.0148564 secs] 3005493K->2804224K(4168128K), 0.0151330 secs] [Times: user=0.05 sys=0.00, real=0.01 secs]

2017-07-31T09:21:28.567-0400: 31637.349: [GC (Allocation Failure) 2017-07-31T09:21:28.567-0400: 31637.349: [ParNew: 235968K->26176K(235968K), 0.0102227 secs] 3014016K->2813220K(4168128K), 0.0104645 secs] [Times: user=0.06 sys=0.00, real=0.01 secs]

2017-07-31T09:21:28.625-0400: 31637.408: [GC (Allocation Failure) 2017-07-31T09:21:28.625-0400: 31637.408: [ParNew: 235968K->24898K(235968K), 0.0142528 secs] 3023012K->2820405K(4168128K), 0.0144247 secs] [Times: user=0.04 sys=0.00, real=0.01 secs]

2017-07-31T09:21:28.683-0400: 31637.465: [GC (Allocation Failure) 2017-07-31T09:21:28.683-0400: 31637.465: [ParNew: 234690K->26176K(235968K), 0.0063668 secs] 3030197K->2829797K(4168128K), 0.0065428 secs] [Times: user=0.04 sys=0.00, real=0.00 secs]

2017-07-31T09:21:28.690-0400: 31637.472: [CMS-concurrent-mark: 0.215/0.293 secs] [Times: user=1.96 sys=0.02, real=0.29 secs]

2017-07-31T09:21:28.690-0400: 31637.472: [CMS-concurrent-preclean-start]

2017-07-31T09:21:28.706-0400: 31637.489: [CMS-concurrent-preclean: 0.015/0.016 secs] [Times: user=0.10 sys=0.00, real=0.02 secs]

2017-07-31T09:21:28.706-0400: 31637.489: [CMS-concurrent-abortable-preclean-start]

2017-07-31T09:21:28.717-0400: 31637.499: [CMS-concurrent-abortable-preclean: 0.010/0.010 secs] [Times: user=0.08 sys=0.00, real=0.01 secs]

2017-07-31T09:21:28.720-0400: 31637.502: [GC (CMS Final Remark) [YG occupancy: 150691 K (235968 K)]2017-07-31T09:21:28.720-0400: 31637.502: [Rescan (parallel) , 0.0044312 secs]2017-07-31T09:21:28.725-0400: 31637.507: [weak refs processing, 0.0000361 secs]2017-07-31T09:21:28.725-0400: 31637.507: [class unloading, 0.0159260 secs]2017-07-31T09:21:28.741-0400: 31637.523: [scrub symbol table, 0.0047738 secs]2017-07-31T09:21:28.745-0400: 31637.528: [scrub string table, 0.0007529 secs][1 CMS-remark: 2803621K(3932160K)] 2954312K(4168128K), 0.0260824 secs] [Times: user=0.05 sys=0.01, real=0.03 secs]

2017-07-31T09:21:28.746-0400: 31637.529: [CMS-concurrent-sweep-start]

2017-07-31T09:21:28.764-0400: 31637.546: [GC (Allocation Failure) 2017-07-31T09:21:28.764-0400: 31637.547: [ParNew: 235968K->24488K(235968K), 0.0118600 secs] 3026671K->2823557K(4168128K), 0.0120304 secs] [Times: user=0.04 sys=0.00, real=0.01 secs]

2017-07-31T09:21:28.814-0400: 31637.596: [GC (Allocation Failure) 2017-07-31T09:21:28.814-0400: 31637.596: [ParNew: 234280K->20023K(235968K), 0.0065302 secs] 2836864K->2630726K(4168128K), 0.0067050 secs] [Times: user=0.05 sys=0.00, real=0.01 secs]

2017-07-31T09:21:28.860-0400: 31637.643: [GC (Allocation Failure) 2017-07-31T09:21:28.860-0400: 31637.643: [ParNew: 229815K->26176K(235968K), 0.0063873 secs] 2618902K->2423042K(4168128K), 0.0065509 secs] [Times: user=0.04 sys=0.00, real=0.00 secs]

2017-07-31T09:21:28.906-0400: 31637.689: [GC (Allocation Failure) 2017-07-31T09:21:28.906-0400: 31637.689: [ParNew: 235968K->26176K(235968K), 0.0067254 secs] 2458627K->2257276K(4168128K), 0.0069130 secs] [Times: user=0.05 sys=0.00, real=0.01 secs]

2017-07-31T09:21:28.948-0400: 31637.731: [GC (Allocation Failure) 2017-07-31T09:21:28.948-0400: 31637.731: [ParNew: 235968K->26176K(235968K), 0.0067188 secs] 2334968K->2134585K(4168128K), 0.0069048 secs] [Times: user=0.04 sys=0.00, real=0.01 secs]

2017-07-31T09:21:28.993-0400: 31637.776: [GC (Allocation Failure) 2017-07-31T09:21:28.993-0400: 31637.776: [ParNew: 235968K->26176K(235968K), 0.0066047 secs] 2225242K->2025135K(4168128K), 0.0067894 secs] [Times: user=0.04 sys=0.00, real=0.00 secs]

2017-07-31T09:21:29.037-0400: 31637.819: [GC (Allocation Failure) 2017-07-31T09:21:29.037-0400: 31637.819: [ParNew: 235968K->26176K(235968K), 0.0089025 secs] 2095872K->1896438K(4168128K), 0.0090755 secs] [Times: user=0.04 sys=0.00, real=0.01 secs]

2017-07-31T09:21:29.088-0400: 31637.870: [GC (Allocation Failure) 2017-07-31T09:21:29.088-0400: 31637.870: [ParNew: 235722K->26176K(235968K), 0.0134455 secs] 1922843K->1722909K(4168128K), 0.0136126 secs] [Times: user=0.04 sys=0.00, real=0.02 secs]

2017-07-31T09:21:29.149-0400: 31637.932: [GC (Allocation Failure) 2017-07-31T09:21:29.150-0400: 31637.932: [ParNew: 235968K->26176K(235968K), 0.0120064 secs] 1740377K->1541773K(4168128K), 0.0121990 secs] [Times: user=0.05 sys=0.00, real=0.02 secs]

2017-07-31T09:21:29.166-0400: 31637.948: [CMS-concurrent-sweep: 0.329/0.419 secs] [Times: user=2.85 sys=0.04, real=0.42 secs]

2017-07-31T09:21:29.166-0400: 31637.948: [CMS-concurrent-reset-start]

2017-07-31T09:21:29.175-0400: 31637.957: [CMS-concurrent-reset: 0.009/0.009 secs] [Times: user=0.05 sys=0.00, real=0.01 secs]

2017-07-31T09:21:29.210-0400: 31637.992: [GC (Allocation Failure) 2017-07-31T09:21:29.210-0400: 31637.992: [ParNew: 235968K->26176K(235968K), 0.0112487 secs] 1743425K->1545019K(4168128K), 0.0114311 secs] [Times: user=0.05 sys=0.00, real=0.02 secs]

2017-07-31T09:21:29.270-0400: 31638.052: [GC (Allocation Failure) 2017-07-31T09:21:29.270-0400: 31638.052: [ParNew: 235968K->26176K(235968K), 0.0136998 secs] 1754811K->1557714K(4168128K), 0.0138779 secs] [Times: user=0.05 sys=0.00, real=0.02 secs]

2017-07-31T09:21:29.334-0400: 31638.117: [GC (Allocation Failure) 2017-07-31T09:21:29.335-0400: 31638.117: [ParNew: 235968K->26176K(235968K), 0.0156127 secs] 1767506K->1567254K(4168128K), 0.0158792 secs] [Times: user=0.06 sys=0.00, real=0.02 secs]

2017-07-31T09:21:29.402-0400: 31638.184: [GC (Allocation Failure) 2017-07-31T09:21:29.402-0400: 31638.185: [ParNew: 235968K->26176K(235968K), 0.0153380 secs] 1777046K->1577991K(4168128K), 0.0155085 secs] [Times: user=0.04 sys=0.00, real=0.01 secs]

2017-07-31T09:21:29.474-0400: 31638.257: [GC (Allocation Failure) 2017-07-31T09:21:29.475-0400: 31638.257: [ParNew: 235968K->26176K(235968K), 0.0068952 secs] 1787783K->1588132K(4168128K), 0.0070781 secs] [Times: user=0.05 sys=0.00, real=0.01 secs]

2017-07-31T09:21:29.562-0400: 31638.344: [GC (Allocation Failure) 2017-07-31T09:21:29.562-0400: 31638.344: [ParNew: 235968K->26176K(235968K), 0.0066656 secs] 1797924K->1596801K(4168128K), 0.0068402 secs] [Times: user=0.05 sys=0.01, real=0.00 secs]

--------------------------------------------------

Ambari-metrics-collector.log

2017-07-31 09:01:30,335 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.TimelineMetricClusterAggregatorSecond: End aggregation cycle @ Mon Jul 31 09:01:30 EDT 2017

2017-07-31 09:01:30,335 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.TimelineMetricClusterAggregatorSecond: End aggregation cycle @ Mon Jul 31 09:01:30 EDT 2017

2017-07-31 09:03:24,821 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.TimelineMetricClusterAggregatorSecond: Last check point time: 1501506024492, lagBy: 180 seconds.

2017-07-31 09:03:24,821 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.TimelineMetricClusterAggregatorSecond: Start aggregation cycle @ Mon Jul 31 09:03:24 EDT 2017, startTime = Mon Jul 31 09:00:24 EDT 2017, endTime = Mon Jul 31 09:02:24 EDT 2017

2017-07-31 09:03:29,731 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.TimelineMetricClusterAggregatorSecond: Saving 6871 metric aggregates.

2017-07-31 09:03:30,175 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.TimelineMetricClusterAggregatorSecond: End aggregation cycle @ Mon Jul 31 09:03:30 EDT 2017

2017-07-31 09:03:30,176 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.TimelineMetricClusterAggregatorSecond: End aggregation cycle @ Mon Jul 31 09:03:30 EDT 2017

2017-07-31 09:05:24,600 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.v2.TimelineMetricHostAggregator: Last check point time: 1501506024492, lagBy: 300 seconds.

2017-07-31 09:05:24,600 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.v2.TimelineMetricHostAggregator: Start aggregation cycle @ Mon Jul 31 09:05:24 EDT 2017, startTime = Mon Jul 31 09:00:24 EDT 2017, endTime = Mon Jul 31 09:05:24 EDT 2017

2017-07-31 09:05:24,710 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.v2.TimelineMetricClusterAggregator: Last check point time: 1501506024493, lagBy: 300 seconds.

2017-07-31 09:05:24,710 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.v2.TimelineMetricClusterAggregator: Start aggregation cycle @ Mon Jul 31 09:05:24 EDT 2017, startTime = Mon Jul 31 09:00:24 EDT 2017, endTime = Mon Jul 31 09:05:24 EDT 2017

2017-07-31 09:05:24,825 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.TimelineMetricClusterAggregatorSecond: Last check point time: 1501506144492, lagBy: 180 seconds.

2017-07-31 09:05:24,825 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.TimelineMetricClusterAggregatorSecond: Start aggregation cycle @ Mon Jul 31 09:05:24 EDT 2017, startTime = Mon Jul 31 09:02:24 EDT 2017, endTime = Mon Jul 31 09:04:24 EDT 2017

2017-07-31 09:05:36,953 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.v2.TimelineMetricHostAggregator: 3162 row(s) updated.

2017-07-31 09:05:36,953 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.v2.TimelineMetricHostAggregator: Aggregated host metrics for METRIC_RECORD_MINUTE, with startTime = Mon Jul 31 09:00:24 EDT 2017, endTime = Mon Jul 31 09:05:24 EDT 2017

2017-07-31 09:05:36,953 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.v2.TimelineMetricHostAggregator: End aggregation cycle @ Mon Jul 31 09:05:36 EDT 2017

2017-07-31 09:05:36,953 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.v2.TimelineMetricHostAggregator: End aggregation cycle @ Mon Jul 31 09:05:36 EDT 2017

2017-07-31 09:05:39,338 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.TimelineMetricClusterAggregatorSecond: Saving 6276 metric aggregates.

2017-07-31 09:05:39,838 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.TimelineMetricClusterAggregatorSecond: End aggregation cycle @ Mon Jul 31 09:05:39 EDT 2017

2017-07-31 09:05:39,838 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.TimelineMetricClusterAggregatorSecond: End aggregation cycle @ Mon Jul 31 09:05:39 EDT 2017

2017-07-31 09:05:56,461 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.v2.TimelineMetricClusterAggregator: 1680 row(s) updated.

2017-07-31 09:05:56,461 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.v2.TimelineMetricClusterAggregator: Aggregated cluster metrics for METRIC_AGGREGATE_MINUTE, with startTime = Mon Jul 31 09:00:24 EDT 2017, endTime = Mon Jul 31 09:05:24 EDT 2017

2017-07-31 09:05:56,461 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.v2.TimelineMetricClusterAggregator: End aggregation cycle @ Mon Jul 31 09:05:56 EDT 2017

2017-07-31 09:05:56,461 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.v2.TimelineMetricClusterAggregator: End aggregation cycle @ Mon Jul 31 09:05:56 EDT 2017

2017-07-31 09:07:24,825 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.TimelineMetricClusterAggregatorSecond: Last check point time: 1501506264492, lagBy: 180 seconds.

2017-07-31 09:07:24,825 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.TimelineMetricClusterAggregatorSecond: Start aggregation cycle @ Mon Jul 31 09:07:24 EDT 2017, startTime = Mon Jul 31 09:04:24 EDT 2017, endTime = Mon Jul 31 09:06:24 EDT 2017

2017-07-31 09:07:30,144 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.TimelineMetricClusterAggregatorSecond: Saving 5285 metric aggregates.

2017-07-31 09:07:30,489 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.TimelineMetricClusterAggregatorSecond: End aggregation cycle @ Mon Jul 31 09:07:30 EDT 2017

2017-07-31 09:07:30,489 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.TimelineMetricClusterAggregatorSecond: End aggregation cycle @ Mon Jul 31 09:07:30 EDT 2017

2017-07-31 09:09:24,825 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.TimelineMetricClusterAggregatorSecond: Last check point time: 1501506384492, lagBy: 180 seconds.

2017-07-31 09:09:24,825 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.TimelineMetricClusterAggregatorSecond: Start aggregation cycle @ Mon Jul 31 09:09:24 EDT 2017, startTime = Mon Jul 31 09:06:24 EDT 2017, endTime = Mon Jul 31 09:08:24 EDT 2017

2017-07-31 09:09:29,965 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.TimelineMetricClusterAggregatorSecond: Saving 6374 metric aggregates.

2017-07-31 09:09:30,426 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.TimelineMetricClusterAggregatorSecond: End aggregation cycle @ Mon Jul 31 09:09:30 EDT 2017

2017-07-31 09:09:30,426 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.TimelineMetricClusterAggregatorSecond: End aggregation cycle @ Mon Jul 31 09:09:30 EDT 2017

2017-07-31 09:10:24,601 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.v2.TimelineMetricHostAggregator: Last check point time: 1501506324492, lagBy: 300 seconds.

2017-07-31 09:10:24,601 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.v2.TimelineMetricHostAggregator: Start aggregation cycle @ Mon Jul 31 09:10:24 EDT 2017, startTime = Mon Jul 31 09:05:24 EDT 2017, endTime = Mon Jul 31 09:10:24 EDT 2017

2017-07-31 09:10:24,710 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.v2.TimelineMetricClusterAggregator: Last check point time: 1501506324493, lagBy: 300 seconds.

2017-07-31 09:10:24,710 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.v2.TimelineMetricClusterAggregator: Start aggregation cycle @ Mon Jul 31 09:10:24 EDT 2017, startTime = Mon Jul 31 09:05:24 EDT 2017, endTime = Mon Jul 31 09:10:24 EDT 2017

2017-07-31 09:10:33,403 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.v2.TimelineMetricHostAggregator: 3162 row(s) updated.

2017-07-31 09:10:33,403 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.v2.TimelineMetricHostAggregator: Aggregated host metrics for METRIC_RECORD_MINUTE, with startTime = Mon Jul 31 09:05:24 EDT 2017, endTime = Mon Jul 31 09:10:24 EDT 2017

2017-07-31 09:10:33,403 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.v2.TimelineMetricHostAggregator: End aggregation cycle @ Mon Jul 31 09:10:33 EDT 2017

2017-07-31 09:10:33,403 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.v2.TimelineMetricHostAggregator: End aggregation cycle @ Mon Jul 31 09:10:33 EDT 2017

2017-07-31 09:10:52,945 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.v2.TimelineMetricClusterAggregator: 1633 row(s) updated.

2017-07-31 09:10:52,945 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.v2.TimelineMetricClusterAggregator: Aggregated cluster metrics for METRIC_AGGREGATE_MINUTE, with startTime = Mon Jul 31 09:05:24 EDT 2017, endTime = Mon Jul 31 09:10:24 EDT 2017

2017-07-31 09:10:52,945 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.v2.TimelineMetricClusterAggregator: End aggregation cycle @ Mon Jul 31 09:10:52 EDT 2017

2017-07-31 09:10:52,945 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.v2.TimelineMetricClusterAggregator: End aggregation cycle @ Mon Jul 31 09:10:52 EDT 2017

2017-07-31 09:11:24,826 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.TimelineMetricClusterAggregatorSecond: Last check point time: 1501506504492, lagBy: 180 seconds.

2017-07-31 09:11:24,826 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.TimelineMetricClusterAggregatorSecond: Start aggregation cycle @ Mon Jul 31 09:11:24 EDT 2017, startTime = Mon Jul 31 09:08:24 EDT 2017, endTime = Mon Jul 31 09:10:24 EDT 2017

2017-07-31 09:11:29,970 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.TimelineMetricClusterAggregatorSecond: Saving 6027 metric aggregates.

2017-07-31 09:11:30,360 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.TimelineMetricClusterAggregatorSecond: End aggregation cycle @ Mon Jul 31 09:11:30 EDT 2017

2017-07-31 09:11:30,360 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.TimelineMetricClusterAggregatorSecond: End aggregation cycle @ Mon Jul 31 09:11:30 EDT 2017

2017-07-31 09:13:24,826 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.TimelineMetricClusterAggregatorSecond: Last check point time: 1501506624492, lagBy: 180 seconds.

2017-07-31 09:13:24,826 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.TimelineMetricClusterAggregatorSecond: Start aggregation cycle @ Mon Jul 31 09:13:24 EDT 2017, startTime = Mon Jul 31 09:10:24 EDT 2017, endTime = Mon Jul 31 09:12:24 EDT 2017

2017-07-31 09:13:30,432 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.TimelineMetricClusterAggregatorSecond: Saving 5377 metric aggregates.

2017-07-31 09:13:30,804 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.TimelineMetricClusterAggregatorSecond: End aggregation cycle @ Mon Jul 31 09:13:30 EDT 2017

2017-07-31 09:13:30,804 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.TimelineMetricClusterAggregatorSecond: End aggregation cycle @ Mon Jul 31 09:13:30 EDT 2017

2017-07-31 09:15:24,602 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.v2.TimelineMetricHostAggregator: Last check point time: 1501506624492, lagBy: 300 seconds.

2017-07-31 09:15:24,603 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.v2.TimelineMetricHostAggregator: Start aggregation cycle @ Mon Jul 31 09:15:24 EDT 2017, startTime = Mon Jul 31 09:10:24 EDT 2017, endTime = Mon Jul 31 09:15:24 EDT 2017

2017-07-31 09:15:24,711 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.v2.TimelineMetricClusterAggregator: Last check point time: 1501506624493, lagBy: 300 seconds.

2017-07-31 09:15:24,711 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.v2.TimelineMetricClusterAggregator: Start aggregation cycle @ Mon Jul 31 09:15:24 EDT 2017, startTime = Mon Jul 31 09:10:24 EDT 2017, endTime = Mon Jul 31 09:15:24 EDT 2017

2017-07-31 09:15:24,827 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.TimelineMetricClusterAggregatorSecond: Last check point time: 1501506744492, lagBy: 180 seconds.

2017-07-31 09:15:24,828 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.TimelineMetricClusterAggregatorSecond: Start aggregation cycle @ Mon Jul 31 09:15:24 EDT 2017, startTime = Mon Jul 31 09:12:24 EDT 2017, endTime = Mon Jul 31 09:14:24 EDT 2017

2017-07-31 09:15:37,728 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.v2.TimelineMetricHostAggregator: 3162 row(s) updated.

2017-07-31 09:15:37,728 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.v2.TimelineMetricHostAggregator: Aggregated host metrics for METRIC_RECORD_MINUTE, with startTime = Mon Jul 31 09:10:24 EDT 2017, endTime = Mon Jul 31 09:15:24 EDT 2017

2017-07-31 09:15:37,728 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.v2.TimelineMetricHostAggregator: End aggregation cycle @ Mon Jul 31 09:15:37 EDT 2017

2017-07-31 09:15:37,728 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.v2.TimelineMetricHostAggregator: End aggregation cycle @ Mon Jul 31 09:15:37 EDT 2017

2017-07-31 09:15:40,414 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.TimelineMetricClusterAggregatorSecond: Saving 5748 metric aggregates.

2017-07-31 09:15:40,893 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.TimelineMetricClusterAggregatorSecond: End aggregation cycle @ Mon Jul 31 09:15:40 EDT 2017

2017-07-31 09:15:40,893 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.TimelineMetricClusterAggregatorSecond: End aggregation cycle @ Mon Jul 31 09:15:40 EDT 2017

2017-07-31 09:15:56,957 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.v2.TimelineMetricClusterAggregator: 1667 row(s) updated.

2017-07-31 09:15:56,957 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.v2.TimelineMetricClusterAggregator: Aggregated cluster metrics for METRIC_AGGREGATE_MINUTE, with startTime = Mon Jul 31 09:10:24 EDT 2017, endTime = Mon Jul 31 09:15:24 EDT 2017

2017-07-31 09:15:56,957 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.v2.TimelineMetricClusterAggregator: End aggregation cycle @ Mon Jul 31 09:15:56 EDT 2017

2017-07-31 09:15:56,957 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.v2.TimelineMetricClusterAggregator: End aggregation cycle @ Mon Jul 31 09:15:56 EDT 2017

2017-07-31 09:17:24,827 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.TimelineMetricClusterAggregatorSecond: Last check point time: 1501506864492, lagBy: 180 seconds.

2017-07-31 09:17:24,827 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.TimelineMetricClusterAggregatorSecond: Start aggregation cycle @ Mon Jul 31 09:17:24 EDT 2017, startTime = Mon Jul 31 09:14:24 EDT 2017, endTime = Mon Jul 31 09:16:24 EDT 2017

2017-07-31 09:17:29,785 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.TimelineMetricClusterAggregatorSecond: Saving 6974 metric aggregates.

2017-07-31 09:17:30,256 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.TimelineMetricClusterAggregatorSecond: End aggregation cycle @ Mon Jul 31 09:17:30 EDT 2017

2017-07-31 09:17:30,256 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.TimelineMetricClusterAggregatorSecond: End aggregation cycle @ Mon Jul 31 09:17:30 EDT 2017

2017-07-31 09:19:24,828 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.TimelineMetricClusterAggregatorSecond: Last check point time: 1501506984492, lagBy: 180 seconds.

2017-07-31 09:19:24,828 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.TimelineMetricClusterAggregatorSecond: Start aggregation cycle @ Mon Jul 31 09:19:24 EDT 2017, startTime = Mon Jul 31 09:16:24 EDT 2017, endTime = Mon Jul 31 09:18:24 EDT 2017

2017-07-31 09:19:29,745 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.TimelineMetricClusterAggregatorSecond: Saving 6379 metric aggregates.

2017-07-31 09:19:30,199 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.TimelineMetricClusterAggregatorSecond: End aggregation cycle @ Mon Jul 31 09:19:30 EDT 2017

2017-07-31 09:19:30,199 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.TimelineMetricClusterAggregatorSecond: End aggregation cycle @ Mon Jul 31 09:19:30 EDT 2017

2017-07-31 09:20:24,603 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.v2.TimelineMetricHostAggregator: Last check point time: 1501506924492, lagBy: 300 seconds.

2017-07-31 09:20:24,604 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.v2.TimelineMetricHostAggregator: Start aggregation cycle @ Mon Jul 31 09:20:24 EDT 2017, startTime = Mon Jul 31 09:15:24 EDT 2017, endTime = Mon Jul 31 09:20:24 EDT 2017

2017-07-31 09:20:24,712 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.v2.TimelineMetricClusterAggregator: Last check point time: 1501506924493, lagBy: 300 seconds.

2017-07-31 09:20:24,712 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.v2.TimelineMetricClusterAggregator: Start aggregation cycle @ Mon Jul 31 09:20:24 EDT 2017, startTime = Mon Jul 31 09:15:24 EDT 2017, endTime = Mon Jul 31 09:20:24 EDT 2017

2017-07-31 09:20:33,537 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.v2.TimelineMetricHostAggregator: 3162 row(s) updated.

2017-07-31 09:20:33,537 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.v2.TimelineMetricHostAggregator: Aggregated host metrics for METRIC_RECORD_MINUTE, with startTime = Mon Jul 31 09:15:24 EDT 2017, endTime = Mon Jul 31 09:20:24 EDT 2017

2017-07-31 09:20:33,537 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.v2.TimelineMetricHostAggregator: End aggregation cycle @ Mon Jul 31 09:20:33 EDT 2017

2017-07-31 09:20:33,537 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.v2.TimelineMetricHostAggregator: End aggregation cycle @ Mon Jul 31 09:20:33 EDT 2017

2017-07-31 09:20:52,152 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.v2.TimelineMetricClusterAggregator: 1693 row(s) updated.

2017-07-31 09:20:52,152 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.v2.TimelineMetricClusterAggregator: Aggregated cluster metrics for METRIC_AGGREGATE_MINUTE, with startTime = Mon Jul 31 09:15:24 EDT 2017, endTime = Mon Jul 31 09:20:24 EDT 2017

2017-07-31 09:20:52,152 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.v2.TimelineMetricClusterAggregator: End aggregation cycle @ Mon Jul 31 09:20:52 EDT 2017

2017-07-31 09:20:52,152 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.v2.TimelineMetricClusterAggregator: End aggregation cycle @ Mon Jul 31 09:20:52 EDT 2017

2017-07-31 09:21:24,829 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.TimelineMetricClusterAggregatorSecond: Last check point time: 1501507104492, lagBy: 180 seconds.

2017-07-31 09:21:24,829 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.TimelineMetricClusterAggregatorSecond: Start aggregation cycle @ Mon Jul 31 09:21:24 EDT 2017, startTime = Mon Jul 31 09:18:24 EDT 2017, endTime = Mon Jul 31 09:20:24 EDT 2017

2017-07-31 09:21:29,723 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.TimelineMetricClusterAggregatorSecond: Saving 5723 metric aggregates.

2017-07-31 09:21:30,127 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.TimelineMetricClusterAggregatorSecond: End aggregation cycle @ Mon Jul 31 09:21:30 EDT 2017

2017-07-31 09:21:30,127 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.aggregators.TimelineMetricClusterAggregatorSecond: End aggregation cycle @ Mon Jul 31 09:21:30 EDT 2017

--------------------------------------------------

Ambari-agent log:

p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 11.0px Menlo; color: #ffffff; background-color: #000000; background-color: rgba(0, 0, 0, 0.58)} span.s1 {font-variant-ligatures: no-common-ligatures}

INFO 2017-07-31 09:26:00,810 logger.py:67 - Host contains mounts: ['/proc', '/sys', '/dev', '/sys/kernel/security', '/dev/shm', '/dev/pts', '/run', '/sys/fs/cgroup', '/sys/fs/cgroup/systemd', '/sys/fs/pstore', '/sys/fs/cgroup/cpuset', '/sys/fs/cgroup/cpu,cpuacct', '/sys/fs/cgroup/memory', '/sys/fs/cgroup/devices', '/sys/fs/cgroup/freezer', '/sys/fs/cgroup/net_cls', '/sys/fs/cgroup/blkio', '/sys/fs/cgroup/perf_event', '/sys/fs/cgroup/hugetlb', '/sys/kernel/config', '/', '/var/lib/nfs/rpc_pipefs', '/proc/sys/fs/binfmt_misc', '/sys/kernel/debug', '/dev/hugepages', '/dev/mqueue', '/proc/sys/fs/binfmt_misc'].

INFO 2017-07-31 09:26:00,810 logger.py:67 - Mount point for directory /hadoop/hdfs/data is /

INFO 2017-07-31 09:26:00,837 logger.py:67 - Execute['export HIVE_CONF_DIR='/usr/iop/current/hive-metastore/conf/conf.server' ; hive --hiveconf hive.metastore.uris=thrift://bi-node-3.novalocal:9083 --hiveconf hive.metastore.client.connect.retry.delay=1 --hiveconf hive.metastore.failure.retries=1 --hiveconf hive.metastore.connect.retries=1 --hiveconf hive.metastore.client.socket.timeout=20 --hiveconf hive.execution.engine=mr -e 'show databases;''] {'path': ['/bin/', '/usr/bin/', '/usr/sbin/', '/usr/iop/current/hive-metastore/bin'], 'user': 'ambari-qa', 'timeout': 60}

INFO 2017-07-31 09:26:06,697 Heartbeat.py:78 - Building Heartbeat: {responseId = 25227, timestamp = 1501507566697, commandsInProgress = False, componentsMapped = True}

INFO 2017-07-31 09:26:06,702 Controller.py:265 - Heartbeat response received (id = 25228)

INFO 2017-07-31 09:26:16,703 Heartbeat.py:78 - Building Heartbeat: {responseId = 25228, timestamp = 1501507576703, commandsInProgress = False, componentsMapped = True}

INFO 2017-07-31 09:26:16,863 Controller.py:265 - Heartbeat response received (id = 25229)

INFO 2017-07-31 09:26:16,863 ActionQueue.py:99 - Adding STATUS_COMMAND for service AMBARI_METRICS of cluster T2P_ACE to the queue.

INFO 2017-07-31 09:26:16,863 ActionQueue.py:99 - Adding STATUS_COMMAND for service AMBARI_METRICS of cluster T2P_ACE to the queue.

INFO 2017-07-31 09:26:16,863 ActionQueue.py:99 - Adding STATUS_COMMAND for service FLUME of cluster T2P_ACE to the queue.

INFO 2017-07-31 09:26:16,864 ActionQueue.py:99 - Adding STATUS_COMMAND for service HBASE of cluster T2P_ACE to the queue.

INFO 2017-07-31 09:26:16,864 ActionQueue.py:99 - Adding STATUS_COMMAND for service HBASE of cluster T2P_ACE to the queue.

INFO 2017-07-31 09:26:16,864 ActionQueue.py:99 - Adding STATUS_COMMAND for service HDFS of cluster T2P_ACE to the queue.

INFO 2017-07-31 09:26:16,864 ActionQueue.py:99 - Adding STATUS_COMMAND for service HIVE of cluster T2P_ACE to the queue.

INFO 2017-07-31 09:26:16,864 ActionQueue.py:99 - Adding STATUS_COMMAND for service KNOX of cluster T2P_ACE to the queue.

INFO 2017-07-31 09:26:16,864 ActionQueue.py:99 - Adding STATUS_COMMAND for service RANGER of cluster T2P_ACE to the queue.

INFO 2017-07-31 09:26:16,864 ActionQueue.py:99 - Adding STATUS_COMMAND for service RANGER of cluster T2P_ACE to the queue.

INFO 2017-07-31 09:26:16,864 ActionQueue.py:99 - Adding STATUS_COMMAND for service YARN of cluster T2P_ACE to the queue.

INFO 2017-07-31 09:26:16,864 ActionQueue.py:99 - Adding STATUS_COMMAND for service ZOOKEEPER of cluster T2P_ACE to the queue.

INFO 2017-07-31 09:26:26,866 Heartbeat.py:78 - Building Heartbeat: {responseId = 25229, timestamp = 1501507586865, commandsInProgress = False, componentsMapped = True}

INFO 2017-07-31 09:26:26,917 Controller.py:265 - Heartbeat response received (id = 25230)

INFO 2017-07-31 09:26:36,918 Heartbeat.py:78 - Building Heartbeat: {responseId = 25230, timestamp = 1501507596918, commandsInProgress = False, componentsMapped = True}

INFO 2017-07-31 09:26:37,006 logger.py:67 - call['test -w /'] {'sudo': True, 'timeout': 5}

INFO 2017-07-31 09:26:37,024 logger.py:67 - call returned (0, '')

INFO 2017-07-31 09:26:37,025 logger.py:67 - call['test -w /dev'] {'sudo': True, 'timeout': 5}

INFO 2017-07-31 09:26:37,040 logger.py:67 - call returned (0, '')

INFO 2017-07-31 09:26:37,041 logger.py:67 - call['test -w /dev/shm'] {'sudo': True, 'timeout': 5}

INFO 2017-07-31 09:26:37,055 logger.py:67 - call returned (0, '')

INFO 2017-07-31 09:26:37,055 logger.py:67 - call['test -w /run'] {'sudo': True, 'timeout': 5}

INFO 2017-07-31 09:26:37,068 logger.py:67 - call returned (0, '')

INFO 2017-07-31 09:26:37,069 logger.py:67 - call['test -w /sys/fs/cgroup'] {'sudo': True, 'timeout': 5}

INFO 2017-07-31 09:26:37,081 logger.py:67 - call returned (0, '')

INFO 2017-07-31 09:26:37,087 Controller.py:265 - Heartbeat response received (id = 25231)

INFO 2017-07-31 09:26:47,088 Heartbeat.py:78 - Building Heartbeat: {responseId = 25231, timestamp = 1501507607087, commandsInProgress = False, componentsMapped = True}

INFO 2017-07-31 09:26:47,093 Controller.py:265 - Heartbeat response received (id = 25232)

INFO 2017-07-31 09:26:57,094 Heartbeat.py:78 - Building Heartbeat: {responseId = 25232, timestamp = 1501507617093, commandsInProgress = False, componentsMapped = True}

INFO 2017-07-31 09:26:57,099 Controller.py:265 - Heartbeat response received (id = 25233)

INFO 2017-07-31 09:27:00,843 logger.py:67 - Execute['export HIVE_CONF_DIR='/usr/iop/current/hive-metastore/conf/conf.server' ; hive --hiveconf hive.metastore.uris=thrift://bi-node-3.novalocal:9083 --hiveconf hive.metastore.client.connect.retry.delay=1 --hiveconf hive.metastore.failure.retries=1 --hiveconf hive.metastore.connect.retries=1 --hiveconf hive.metastore.client.socket.timeout=20 --hiveconf hive.execution.engine=mr -e 'show databases;''] {'path': ['/bin/', '/usr/bin/', '/usr/sbin/', '/usr/iop/current/hive-metastore/bin'], 'user': 'ambari-qa', 'timeout': 60}

--------------------------------------------------

Ambari-agent alert log:

p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 11.0px Menlo; color: #ffffff; background-color: #000000; background-color: rgba(0, 0, 0, 0.58)} span.s1 {font-variant-ligatures: no-common-ligatures}

INFO 2017-07-31 09:27:00,843 logger.py:67 - Execute['export HIVE_CONF_DIR='/usr/iop/current/hive-metastore/conf/conf.server' ; hive --hiveconf hive.metastore.uris=thrift://bi-node-3.novalocal:9083 --hiveconf hive.metastore.client.connect.retry.delay=1 --hiveconf hive.metastore.failure.retries=1 --hiveconf hive.metastore.connect.retries=1 --hiveconf hive.metastore.client.socket.timeout=20 --hiveconf hive.execution.engine=mr -e 'show databases;''] {'path': ['/bin/', '/usr/bin/', '/usr/sbin/', '/usr/iop/current/hive-metastore/bin'], 'user': 'ambari-qa', 'timeout': 60}

INFO 2017-07-31 09:27:37,380 logger.py:67 - call['test -w /'] {'sudo': True, 'timeout': 5}

INFO 2017-07-31 09:27:37,393 logger.py:67 - call returned (0, '')

INFO 2017-07-31 09:27:37,393 logger.py:67 - call['test -w /dev'] {'sudo': True, 'timeout': 5}

INFO 2017-07-31 09:27:37,406 logger.py:67 - call returned (0, '')

INFO 2017-07-31 09:27:37,407 logger.py:67 - call['test -w /dev/shm'] {'sudo': True, 'timeout': 5}

INFO 2017-07-31 09:27:37,418 logger.py:67 - call returned (0, '')

INFO 2017-07-31 09:27:37,419 logger.py:67 - call['test -w /run'] {'sudo': True, 'timeout': 5}

INFO 2017-07-31 09:27:37,431 logger.py:67 - call returned (0, '')

INFO 2017-07-31 09:27:37,431 logger.py:67 - call['test -w /sys/fs/cgroup'] {'sudo': True, 'timeout': 5}

INFO 2017-07-31 09:27:37,444 logger.py:67 - call returned (0, '')

INFO 2017-07-31 09:28:00,831 logger.py:67 - Host contains mounts: ['/proc', '/sys', '/dev', '/sys/kernel/security', '/dev/shm', '/dev/pts', '/run', '/sys/fs/cgroup', '/sys/fs/cgroup/systemd', '/sys/fs/pstore', '/sys/fs/cgroup/cpuset', '/sys/fs/cgroup/cpu,cpuacct', '/sys/fs/cgroup/memory', '/sys/fs/cgroup/devices', '/sys/fs/cgroup/freezer', '/sys/fs/cgroup/net_cls', '/sys/fs/cgroup/blkio', '/sys/fs/cgroup/perf_event', '/sys/fs/cgroup/hugetlb', '/sys/kernel/config', '/', '/var/lib/nfs/rpc_pipefs', '/proc/sys/fs/binfmt_misc', '/sys/kernel/debug', '/dev/hugepages', '/dev/mqueue', '/proc/sys/fs/binfmt_misc'].

INFO 2017-07-31 09:28:00,832 logger.py:67 - Mount point for directory /hadoop/hdfs/data is /

INFO 2017-07-31 09:28:00,859 logger.py:67 - Execute['export HIVE_CONF_DIR='/usr/iop/current/hive-metastore/conf/conf.server' ; hive --hiveconf hive.metastore.uris=thrift://bi-node-3.novalocal:9083 --hiveconf hive.metastore.client.connect.retry.delay=1 --hiveconf hive.metastore.failure.retries=1 --hiveconf hive.metastore.connect.retries=1 --hiveconf hive.metastore.client.socket.timeout=20 --hiveconf hive.execution.engine=mr -e 'show databases;''] {'path': ['/bin/', '/usr/bin/', '/usr/sbin/', '/usr/iop/current/hive-metastore/bin'], 'user': 'ambari-qa', 'timeout': 60}

Any help would be greatly appreciated. Also if there are any bugs in these components please advise. I have no other options of using the versions of Ambari that I have because IBM calls for it in Big Insights. So if I can patch/fix my version that is preferable.

Thank you very much for help!

Freemon

61 REPLIES 61

Super Mentor

@Freemon Johnson

- How frequently do you see that the AMS goes down? Please try setting up the AUTO_START off for the AMS as it might cause race condition in older version of ambari. Like: https://issues.apache.org/jira/browse/AMBARI-15492

In Ambari 2.2.x, The ambari server sends information to the agent about the components that have been enabled for auto start along with the other auto start properties in ambari.properties. Please remove the auto_start for Metrics collector in ambari.properties.

recovery.type=AUTO_START
recovery.enabled_components=METRICS_COLLECTOR

- Are you running AMS in Embedded ro Distributed mode? https://cwiki.apache.org/confluence/display/AMBARI/AMS+-+distributed+mode

- Have you set the heap size for your AMS based on the number of cluster nodes as described in the following tuning guide? https://cwiki.apache.org/confluence/display/AMBARI/Configurations+-+Tuning

- Have you tried cleaning the OLD AMS metrics. Sometimes this helps if the "hbase.rootdir" data gets corrupted or the "hbase.zookeeper.property.dataDir" https://cwiki.apache.org/confluence/display/AMBARI/Cleaning+up+Ambari+Metrics+System+Data

In your case we see which might be related to zookeeper data corruption, So cleanup might help.

WARN[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:61181] server.NIOServerCnxn: caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x0, likely client has closed socket
    at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
    at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
at java.lang.Thread.run(Thread.java:745)

.

Explorer

Hi @Jay SenSharma,

First thank you very much for responding so quickly! The answers to your questions are in line below:

How frequently do you see that the AMS goes down?

It has never ran. It has been down since I installed it.

Please try setting up the AUTO_START off for the AMS as it might cause race condition in older version of ambari. Like: https://issues.apache.org/jira/browse/AMBARI-15492

I have not tried this so I will give this a shot:

Are you running AMS in Embedded ro Distributed mode? https://cwiki.apache.org/confluence/display/AMBARI/AMS+-+distributed+mode

Honestly I don't know. I performed all default installation with no customization on my end. In other words if I did it was non-intentional. I will check though.

Have you set the heap size for your AMS based on the number of cluster nodes as described in the following tuning guide? https://cwiki.apache.org/confluence/display/AMBARI/Configurations+-+Tuning

Yes, I tried so many variations of this I am exhausted with possibilities especially since the recommendations from Ambari refutes each change/modification.

Have you tried cleaning the OLD AMS metrics. Sometimes this helps if the "hbase.rootdir" data gets corrupted or the "hbase.zookeeper.property.dataDir" https://cwiki.apache.org/confluence/display/AMBARI/Cleaning+up+Ambari+Metrics+System+Data

Yes I delete both hbase but I have not deleted zookeeper's rootdir. So I will delete both again and restart the service.

Regards,

Freemon

Explorer

@Jay SenSharma

I have completed the steps you identified prior and restarted the ambari-metrics collector and monitor and Ambari still states "n/a" and "No Data Available". I have some questions to ensure I did the steps correctly below:

Please try setting up the AUTO_START off for the AMS as it might cause race condition in older version of ambari. Like: https://issues.apache.org/jira/browse/AMBARI-15492

I commented out the lines for both: recovery.type=AUTO_START and recovery.enabled_component=METRICS_COLLECTOR

Was this correct or was I supposed to supply a value of "false, off, etc."? I could not tell from the directions in the link provided.

Are you running AMS in Embedded ro Distributed mode? https://cwiki.apache.org/confluence/display/AMBARI/AMS+-+distributed+mode

My configuration was running in "Embedded" mode so I changed the configuration along with all of the parameters stated in the directions to "Distributed".

Have you tried cleaning the OLD AMS metrics. Sometimes this helps if the "hbase.rootdir" data gets corrupted or the "hbase.zookeeper.property.dataDir" https://cwiki.apache.org/confluence/display/AMBARI/Cleaning+up+Ambari+Metrics+System+Data

So I performed this "however" once I configured Ambari-Metrics to distributed mode the store was changed to the Name Node folder for hbase instead of the flat file on the node's local file system. There is a folder in the HDFS hbase/ entitled "hbase-staging". Was I suppose to remove that as well? I did not because I assumed the folder was clean once configured the Ambari-Metrics to distributed mode. If I need to remove that folder as well and restart monitor and collector please advise either way.

Thank you!

Freemon

@Freemon Johnson

Very difficult to follow this question. Our practice is to attach the logs to the question, rather than copying the log into the actual question.

I'm going to ask a trivial question. Are your Ambari agents running?

Explorer

@Constantin Stanca

My apologies for the forum decorum. This is my first time or so posting to this group. In the future I will attach the logs.

To answer your question all agents are running on the nodes.

Explorer

@Constantin Stanca

Wasn't sure if you were aware but the maximum size per file is ~524.3KB to be uploaded. All of my log files are larger than that.

Explorer

Mentor

@Freemon Johnson

Locate the logs

I encountered the same issue on some project he is what I did ! Locate several log files, on the host where Metrics Collector is installed, in the following directory:

/var/log/ambari-metrics-collector/

Here's a list of the logs I started digging through:

- hbase-ams-master-[Server]out
- hbase-ams-master-[Server]log
- ambari-metrics-collector-startup.out
- ambari-metrics-collector.out
- ambari-metrics-collector.log

Stop the AM collector

Kill all the processes

# ps aux | grep ams

if any are still alive

kill -15 <pid>

Logon the Ambari server clear the contents of

/var/var/lib/ambari-metrics-collector/hbase-tmp/zookeeper/*
/var/var/lib/ambari-metrics-collector/hbase-tmp/zookeeper] rm *

Backup the following directories using rename option

# /var/lib/ambari-metrics-collector] # mv hbase hbase.bak
#/var/var/lib/ambari-metrics-collector] # mv hbase-tmp hbase-tmp.bak

Restart Ambari metrics collector This should resolve the problem

Explorer

@Geoffrey Shelton Okot

Hi Geoffrey,

I tried the above steps and unfortunately it did not work. I am still showing n/a and "No Data Available". Attached are all of the logs you identified in your troubleshooting.

My log files are larger than this sites allocation so you can download the zipped version of all of my ambari logs here:

https://drive.google.com/file/d/0B_JSEbJxdC-QOHR5VTZuQWFWemc/view?usp=sharing

V/r,
Freemon

Super Mentor

@Freemon Johnson

In your "ambari-metrics-collector.log" we see the following error which usually indicates zookeeper data corruption.

2017-08-01 14:16:08,958 WARN org.apache.hadoop.hbase.zookeeper.ZKUtil: hconnection-0x7a35b0f5-0x15d9e3522180005, quorum=xxxxxxxxxx:61181, baseZNode=/hbase Unable to get data of znode /hbase/table/SYSTEM.CATALOG org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/table/SYSTEM.CATALOG
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
   at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)




Also in your "hbase-ams-master-bi-node-3.novalocal.log" log we se ethe following zookeeper related issue:

2017-08-01 14:15:49,937 ERROR [main] zookeeper.ZooKeeperWatcher: clean znode for master0x0, quorum=xxxxxxxxxx:61181, baseZNode=/hbase Received unexpected KeeperException, re-throwing exception org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/master
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)



Hence can you please try to clear the contents of "/var/lib/ambari-metrics-collector/hbase-tmp/zookeeper/*"" and restart Ambari metrics collector. The above path might be different in your case which you can find in your AMS config "Advanced ams-hbase-site" defined in the following property:

hbase.tmp.dir = /var/lib/ambari-metrics-collector/hbase-tmp
hbase.zookeeper.property.dataDir = ${hbase.tmp.dir}/zookeeper

.

Mentor

@Freemon Johnson

To fix, delete the node in zookeeper, and restart the AMS services

Check the current value of the Ambari Metrics config: zookeeper.znode.parent. In my case the value was: /ams-hbase-unsecure

Stop all of the AMS services as described above.

Then, using the zookeeper cli connect to your zookeeper cluster and delete the zookeeper node

1. Start up the zookeeper client

    # /usr/hdp/current/zookeeper-client/bin/zkCli.sh 

2. Connect to the zookeeper cluster, and list the root nodes for the cluster

[zk: localhost:2181(CONNECTED) 1] connect {zookeeper_host}:2181
[zk: {zookeeper_host}:2181(CONNECTED) 2] ls /
[hiveserver2, zookeeper, hbase-unsecure, ams-hbase-unsecure, rmstore] 

3. Delete the ams-hbase-unsecure node

[zk: {zookeeper_host}:2181(CONNECTED) 3] rmr /ams-hbase-unsecure 

4. Restart the AMS services as described above and then confirm that the ams-hbase-unsecure node was recreated in zookeeper.

[zk: {zookeeper_host}:2181(CONNECTED) 4] ls /     
[hiveserver2, zookeeper, hbase-unsecure, ams-hbase-unsecure, rmstore]

Please revert

Explorer

@Geoffrey Shelton Okot

The parameter zookeeper.znode.parent was never populated in my configuration.

When I attempt to run zkCli.sh I receive the following error:

[centos@bi-node-3 ~]$ /usr/iop/4.2.0.0/zookeeper/bin/zkCli.sh

Connecting to localhost:2181

2017-08-03 09:15:00,871 - INFO[main:Environment@100] - Client environment:zookeeper.version=3.4.6-IBM_4--1, built on 06/17/2016 01:58 GMT

2017-08-03 09:15:00,872 - INFO[main:Environment@100] - Client environment:host.name=bi-node-3.novalocal

2017-08-03 09:15:00,873 - INFO[main:Environment@100] - Client environment:java.version=1.8.0_77

2017-08-03 09:15:00,874 - INFO[main:Environment@100] - Client environment:java.vendor=Oracle Corporation

2017-08-03 09:15:00,875 - INFO[main:Environment@100] - Client environment:java.home=/usr/java/jdk1.8.0_77/jre

2017-08-03 09:15:00,875 - INFO[main:Environment@100] - Client environment:java.class.path=/usr/iop/4.2.0.0/zookeeper/bin/../build/classes:/usr/iop/4.2.0.0/zookeeper/bin/../build/lib/*.jar:/usr/iop/4.2.0.0/zookeeper/bin/../lib/slf4j-log4j12-1.6.1.jar:/usr/iop/4.2.0.0/zookeeper/bin/../lib/slf4j-api-1.6.1.jar:/usr/iop/4.2.0.0/zookeeper/bin/../lib/netty-3.7.0.Final.jar:/usr/iop/4.2.0.0/zookeeper/bin/../lib/log4j-1.2.17.jar:/usr/iop/4.2.0.0/zookeeper/bin/../lib/jline-0.9.94.jar:/usr/iop/4.2.0.0/zookeeper/bin/../zookeeper-3.4.6_IBM_4.jar:/usr/iop/4.2.0.0/zookeeper/bin/../src/java/lib/*.jar:/usr/iop/4.2.0.0/zookeeper/bin/../conf::/usr/share/zookeeper/*

2017-08-03 09:15:00,875 - INFO[main:Environment@100] - Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib

2017-08-03 09:15:00,875 - INFO[main:Environment@100] - Client environment:java.io.tmpdir=/tmp

2017-08-03 09:15:00,875 - INFO[main:Environment@100] - Client environment:java.compiler=<NA>

2017-08-03 09:15:00,875 - INFO[main:Environment@100] - Client environment:os.name=Linux

2017-08-03 09:15:00,875 - INFO[main:Environment@100] - Client environment:os.arch=amd64

2017-08-03 09:15:00,875 - INFO[main:Environment@100] - Client environment:os.version=3.10.0-229.7.2.el7.x86_64

2017-08-03 09:15:00,875 - INFO[main:Environment@100] - Client environment:user.name=centos

2017-08-03 09:15:00,875 - INFO[main:Environment@100] - Client environment:user.home=/home/centos

2017-08-03 09:15:00,875 - INFO[main:Environment@100] - Client environment:user.dir=/home/centos

2017-08-03 09:15:00,877 - INFO[main:ZooKeeper@438] - Initiating client connection, connectString=localhost:2181 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@27fa135a

Welcome to ZooKeeper!

2017-08-03 09:15:00,901 - INFO[main-SendThread(localhost:2181):ClientCnxn$SendThread@975] - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)

JLine support is enabled

2017-08-03 09:15:00,956 - INFO[main-SendThread(localhost:2181):ClientCnxn$SendThread@852] - Socket connection established to localhost/127.0.0.1:2181, initiating session

2017-08-03 09:15:00,967 - INFO[main-SendThread(localhost:2181):ClientCnxn$SendThread@1235] - Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x35d9e32bb18000e, negotiated timeout = 30000

WATCHER::

WatchedEvent state:SyncConnected type:None path:null

[zk: localhost:2181(CONNECTED) 0] connect {bi-node-3.novalocal}:2181

2017-08-03 09:15:18,126 - INFO[main:ZooKeeper@684] - Session: 0x35d9e32bb18000e closed

2017-08-03 09:15:18,126 - INFO[main-EventThread:ClientCnxn$EventThread@512] - EventThread shut down

2017-08-03 09:15:18,127 - INFO[main:ZooKeeper@438] - Initiating client connection, connectString={bi-node-3.novalocal}:2181 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@504bae78

Exception in thread "main" java.net.UnknownHostException: {bi-node-3.novalocal}: unknown error

at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)

at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)

at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)

at java.net.InetAddress.getAllByName0(InetAddress.java:1276)

at java.net.InetAddress.getAllByName(InetAddress.java:1192)

at java.net.InetAddress.getAllByName(InetAddress.java:1126)

at org.apache.zookeeper.client.StaticHostProvider.<init>(StaticHostProvider.java:61)

at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:445)

at org.apache.zookeeper.ZooKeeperMain.connectToZK(ZooKeeperMain.java:273)

at org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:668)

at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:591)

at org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:363)

at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:323)

at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:282)

Explorer
@Geoffrey Shelton Okot

If do not issue the command connect {zookeeper_host}:2181 and just do an "ls" on a zookeeper node I have the following nodes:

[controller_epoch, hiveserver2, controller, brokers, zookeeper, hbase-unsecure, solr, rmstore, admin, isr_change_notification, consumers, config]

As you can see I do not have a node entitled ams-hbase-unsecure.

Mentor

@Freemon Johnson

What distribution of hadoop are you running ?

/usr/iop/4.2.0.0/zookeeper/bin

I see the above which unusal in HDP installations?

Explorer

@Geoffrey Shelton Okot

You are correct. This is using a fork or snapshot of HDP that IBM refers to as IOP. Big Insights runs on HDP. I am running version 4.2 as as well as IOP version 4.2.

The script zkCli.sh is only found at the path I have provided.

Mentor

@Freemon Johnson

No wonder your the locations of some files are not identical

Could you try the following steps

1 `ps -ef | grep metrics` and kill all of metric processes that are caching the `/hbase` value.

Tail the /var/log/ambari-metrics-collector/ambari-metrics-collector.log) while you do the below steps

2. tail -f /var/log/ambari-metrics-collector/ambari-metrics-collector.log

3. Stop Ambari

4. Kill all the metrics processes 5. curl --user admin:admin -i -H "X-Requested-By: ambari" -X DELETE http://`hostname -f`:8080/api/v1/clusters/CLUSTERNAME/services/AMBARI_METRICS

=> Make sure you replace CLUSTERNAME with your cluster name

6. Refresh Ambari UI

7. Add Service

8. Select Ambari Metrics

9. In the configuration screen, make sure to set the value of `zookeeper.znode.parent` to what is configured in the HBase service. By default, in Ambari Metrics it is set to empty value.

10. Deploy

And let me know if that helps

Explorer

@Geoffrey Shelton Okot

When I attempt to delete Ambari I receive the following HTTP error:

{

"status" : 500,

"message" : "org.apache.ambari.server.controller.spi.SystemException: An internal system exception occurred: Cannot remove AMBARI_METRICS. Desired state STARTED is not removable.Service must be stopped or disabled."

}

I am running version 2.2 of Ambari-Server and here is errata listed below that it was not fixed until version 2.3:

https://issues.apache.org/jira/browse/AMBARI-12651

Explorer

@Geoffrey Shelton Okot

This post help and I was able to delete AMBARI_METRICS:

https://stackoverflow.com/questions/31949091/error-deleting-service-using-ambari

I will let you know after I redeploy if this is solved or not.

Explorer

@Geoffrey Shelton Okot

I successfully deleted Ambari Metrics and reinstalled and re-deployed but I still have N/A and No Data Available on my widgets. The cluster is green and all monitors are running properly.

My logs are attached.

https://drive.google.com/file/d/0B_JSEbJxdC-QcW1FSXNsWGhpbmc/view?usp=sharing

Mentor

@Freemon Johnson

We have exchanged but I never asked what operating system ubuntu/Rhel and the version ?

-Have you checked your firewall settings. 
- How much memory do you have  on the problematic node.
- Can you share your /etc/host entry.
- Zookeeper  config file zoo.cfg
- I am still going through your logs.

Meanwhile I am analyzing you logs can you try the following

Shut down

Ambari Monitors, and Collector via Ambari

Cleared out this directory for fresh restart

/var/lib/ambari-metrics-collector

1.From Ambari -> Ambari Metrics -> Config -> Advanced ams-hbase-site get the hbase.rootdir and hbase-tmp directory typically

hbase.rootdir

/var/lib/ambari-metrics-collector/hbase
rm -rf  /var/lib/ambari-metrics-collector/hbase/*

hbase.tmp.dir

/var/lib/ambari-metrics-collector/hbase-tmp
rm -rf /var/lib/ambari-metrics-collector/hbase-tmp/*

Start AMS through Ambari UI

The graphs will start to display, after a few minutes once all services are green(online)