Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

AMBARI METRICS restart randomly

avatar

Hi,

I built a new standalone hdp 2.3. This standalone are not yet sollicited by application but the ambari metrics service shutdown and restart for no reason. It can happen twice/three or four times a week.

This is the logs of my ambari-metrics-collector.log:

2016-03-14 01:29:07,166 ERROR org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer: RECEIVED SIGNAL 15: SIGTERM 2016-03-14 01:29:07,169 INFO org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation: Closing master protocol: MasterService 2016-03-14 01:29:07,182 INFO org.mortbay.log: Stopped HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:6188 2016-03-14 01:29:07,207 INFO org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x1536be5998a0001 2016-03-14 01:29:07,208 INFO org.apache.zookeeper.ZooKeeper: Session: 0x1536be5998a0001 closed 2016-03-14 01:29:07,208 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down 2016-03-14 01:29:07,286 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping phoenix metrics system... 2016-03-14 01:29:07,289 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: phoenix metrics system stopped. 2016-03-14 01:29:07,289 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: phoenix metrics system shutdown complete. 2016-03-14 01:29:07,290 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl: Stopping ApplicationHistory 2016-03-14 01:29:07,290 INFO org.apache.hadoop.ipc.Server: Stopping server on 60200 2016-03-14 01:29:07,294 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder 2016-03-14 01:29:07,294 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 60200 2016-03-14 01:29:07,295 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer: SHUTDOWN_MSG:

Do you know the reason of this restarts?

Thanks,

Gauthier

1 ACCEPTED SOLUTION

avatar
Super Collaborator

Could be because of https://issues.apache.org/jira/browse/AMBARI-15492. This has been fixed in the next Ambari release (2.2.2).

For a workaround please try commenting these 2 properties in /etc/ambari-server/conf/ambari.properties

#recovery.enabled_components=METRICS_COLLECTOR

#recovery.type=AUTO_START

Restart Ambari Server.

View solution in original post

13 REPLIES 13

avatar
Super Collaborator

@GAUTHIER CHRETIEN

What do you see in the AMS HBase logs? You will find them in the same directory as Ambari metrics collector logs.

avatar

Nothing interesting i think:

2016-04-08 02:07:12,581 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:61181] server.NIOServerCnxn: Closed socket connection for client /1.1.1.1:51610 (no session established for client) 2016-04-08 02:07:32,416 INFO [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=3.66 GB, freeSize=197.28 MB, max=3.85 GB, blockCount=254167, accesses=2466441, hits=1745379, hitRatio=70.77%, , cachingAccesses=1844293, cachingHits =1469624, cachingHitsRatio=79.68%, evictions=4889, evicted=120445, evictedPerRun=24.63591766357422 2016-04-08 02:08:12,546 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:61181] server.NIOServerCnxnFactory: Accepted socket connection from /1.1.1.1:51681 2016-04-08 02:08:12,547 WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:61181] server.NIOServerCnxn: caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x0, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745) 2016-04-08 02:08:12,547 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:61181] server.NIOServerCnxn: Closed socket connection for client /1.1.1.1:51681 (no session established for client) 2016-04-08 02:09:12,557 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:61181] server.NIOServerCnxnFactory: Accepted socket connection from /1.1.1.1:51749 2016-04-08 02:09:12,557 WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:61181] server.NIOServerCnxn: caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x0, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745) 2016-04-08 02:09:12,560 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:61181] server.NIOServerCnxn: Closed socket connection for client /1.1.1.1:51749 (no session established for client) 2016-04-08 02:10:12,599 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:61181] server.NIOServerCnxnFactory: Accepted socket connection from /1.1.1.1:51827 2016-04-08 02:10:12,600 WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:61181] server.NIOServerCnxn: caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x0, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)

avatar
Super Collaborator

Can you share the Ambari Agent log on the Metrics collector host? That might give us some useful info.

/var/log/ambari-agent/ambari-agent.log

avatar

NG 2016-04-11 23:26:12,528 base_alert.py:417 - [Alert][namenode_hdfs_pending_deletion_blocks] HA nameservice value is present but there are no aliases for {{hdfs-site/dfs.ha.namenodes.{{ha-nameservice}}}} WARNING 2016-04-11 23:26:12,536 base_alert.py:417 - [Alert][datanode_health_summary] HA nameservice value is present but there are no aliases for {{hdfs-site/dfs.ha.namenodes.{{ha-nameservice}}}} WARNING 2016-04-11 23:26:12,546 base_alert.py:417 - [Alert][namenode_directory_status] HA nameservice value is present but there are no aliases for {{hdfs-site/dfs.ha.namenodes.{{ha-nameservice}}}} WARNING 2016-04-11 23:26:12,554 base_alert.py:417 - [Alert][namenode_hdfs_blocks_health] HA nameservice value is present but there are no aliases for {{hdfs-site/dfs.ha.namenodes.{{ha-nameservice}}}} WARNING 2016-04-11 23:26:12,554 base_alert.py:417 - [Alert][namenode_hdfs_capacity_utilization] HA nameservice value is present but there are no aliases for {{hdfs-site/dfs.ha.namenodes.{{ha-nameservice}}}} INFO 2016-04-11 23:26:12,553 logger.py:67 - Host contains mounts: ['/', '/proc', '/sys', '/dev/pts', '/dev/shm', '/boot', '/home', '/tmp', '/usr', '/var', '/var/cache/yum', '/var/lib/rpm', '/var/log', '/var/tmp', '/hadoop', '/proc/sys/fs/binfmt_misc']. WARNING 2016-04-11 23:26:12,558 base_alert.py:417 - [Alert][namenode_rpc_latency] HA nameservice value is present but there are no aliases for {{hdfs-site/dfs.ha.namenodes.{{ha-nameservice}}}} WARNING 2016-04-11 23:26:12,560 base_alert.py:417 - [Alert][namenode_webui] HA nameservice value is present but there are no aliases for {{hdfs-site/dfs.ha.namenodes.{{ha-nameservice}}}} INFO 2016-04-11 23:26:12,562 logger.py:67 - Mount point for directory /hadoop/hdfs/data is / WARNING 2016-04-11 23:26:12,588 base_alert.py:417 - [Alert][yarn_resourcemanager_webui] HA nameservice value is present but there are no aliases for {{yarn-site/yarn.resourcemanager.ha.rm-ids}}