Created 03-16-2016 07:41 AM
Hi,
I built a new standalone hdp 2.3. This standalone are not yet sollicited by application but the ambari metrics service shutdown and restart for no reason. It can happen twice/three or four times a week.
This is the logs of my ambari-metrics-collector.log:
2016-03-14 01:29:07,166 ERROR org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer: RECEIVED SIGNAL 15: SIGTERM 2016-03-14 01:29:07,169 INFO org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation: Closing master protocol: MasterService 2016-03-14 01:29:07,182 INFO org.mortbay.log: Stopped HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:6188 2016-03-14 01:29:07,207 INFO org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x1536be5998a0001 2016-03-14 01:29:07,208 INFO org.apache.zookeeper.ZooKeeper: Session: 0x1536be5998a0001 closed 2016-03-14 01:29:07,208 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down 2016-03-14 01:29:07,286 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping phoenix metrics system... 2016-03-14 01:29:07,289 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: phoenix metrics system stopped. 2016-03-14 01:29:07,289 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: phoenix metrics system shutdown complete. 2016-03-14 01:29:07,290 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl: Stopping ApplicationHistory 2016-03-14 01:29:07,290 INFO org.apache.hadoop.ipc.Server: Stopping server on 60200 2016-03-14 01:29:07,294 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder 2016-03-14 01:29:07,294 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 60200 2016-03-14 01:29:07,295 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer: SHUTDOWN_MSG:
Do you know the reason of this restarts?
Thanks,
Gauthier
Created 03-30-2016 09:34 PM
Could be because of https://issues.apache.org/jira/browse/AMBARI-15492. This has been fixed in the next Ambari release (2.2.2).
For a workaround please try commenting these 2 properties in /etc/ambari-server/conf/ambari.properties
#recovery.enabled_components=METRICS_COLLECTOR
#recovery.type=AUTO_START
Restart Ambari Server.
Created 04-06-2016 09:40 PM
What do you see in the AMS HBase logs? You will find them in the same directory as Ambari metrics collector logs.
Created 04-08-2016 05:42 AM
Nothing interesting i think:
2016-04-08 02:07:12,581 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:61181] server.NIOServerCnxn: Closed socket connection for client /1.1.1.1:51610 (no session established for client) 2016-04-08 02:07:32,416 INFO [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=3.66 GB, freeSize=197.28 MB, max=3.85 GB, blockCount=254167, accesses=2466441, hits=1745379, hitRatio=70.77%, , cachingAccesses=1844293, cachingHits =1469624, cachingHitsRatio=79.68%, evictions=4889, evicted=120445, evictedPerRun=24.63591766357422 2016-04-08 02:08:12,546 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:61181] server.NIOServerCnxnFactory: Accepted socket connection from /1.1.1.1:51681 2016-04-08 02:08:12,547 WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:61181] server.NIOServerCnxn: caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x0, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745) 2016-04-08 02:08:12,547 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:61181] server.NIOServerCnxn: Closed socket connection for client /1.1.1.1:51681 (no session established for client) 2016-04-08 02:09:12,557 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:61181] server.NIOServerCnxnFactory: Accepted socket connection from /1.1.1.1:51749 2016-04-08 02:09:12,557 WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:61181] server.NIOServerCnxn: caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x0, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745) 2016-04-08 02:09:12,560 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:61181] server.NIOServerCnxn: Closed socket connection for client /1.1.1.1:51749 (no session established for client) 2016-04-08 02:10:12,599 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:61181] server.NIOServerCnxnFactory: Accepted socket connection from /1.1.1.1:51827 2016-04-08 02:10:12,600 WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:61181] server.NIOServerCnxn: caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x0, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
Created 04-08-2016 07:25 PM
Can you share the Ambari Agent log on the Metrics collector host? That might give us some useful info.
/var/log/ambari-agent/ambari-agent.log
Created 04-12-2016 07:24 AM
NG 2016-04-11 23:26:12,528 base_alert.py:417 - [Alert][namenode_hdfs_pending_deletion_blocks] HA nameservice value is present but there are no aliases for {{hdfs-site/dfs.ha.namenodes.{{ha-nameservice}}}} WARNING 2016-04-11 23:26:12,536 base_alert.py:417 - [Alert][datanode_health_summary] HA nameservice value is present but there are no aliases for {{hdfs-site/dfs.ha.namenodes.{{ha-nameservice}}}} WARNING 2016-04-11 23:26:12,546 base_alert.py:417 - [Alert][namenode_directory_status] HA nameservice value is present but there are no aliases for {{hdfs-site/dfs.ha.namenodes.{{ha-nameservice}}}} WARNING 2016-04-11 23:26:12,554 base_alert.py:417 - [Alert][namenode_hdfs_blocks_health] HA nameservice value is present but there are no aliases for {{hdfs-site/dfs.ha.namenodes.{{ha-nameservice}}}} WARNING 2016-04-11 23:26:12,554 base_alert.py:417 - [Alert][namenode_hdfs_capacity_utilization] HA nameservice value is present but there are no aliases for {{hdfs-site/dfs.ha.namenodes.{{ha-nameservice}}}} INFO 2016-04-11 23:26:12,553 logger.py:67 - Host contains mounts: ['/', '/proc', '/sys', '/dev/pts', '/dev/shm', '/boot', '/home', '/tmp', '/usr', '/var', '/var/cache/yum', '/var/lib/rpm', '/var/log', '/var/tmp', '/hadoop', '/proc/sys/fs/binfmt_misc']. WARNING 2016-04-11 23:26:12,558 base_alert.py:417 - [Alert][namenode_rpc_latency] HA nameservice value is present but there are no aliases for {{hdfs-site/dfs.ha.namenodes.{{ha-nameservice}}}} WARNING 2016-04-11 23:26:12,560 base_alert.py:417 - [Alert][namenode_webui] HA nameservice value is present but there are no aliases for {{hdfs-site/dfs.ha.namenodes.{{ha-nameservice}}}} INFO 2016-04-11 23:26:12,562 logger.py:67 - Mount point for directory /hadoop/hdfs/data is / WARNING 2016-04-11 23:26:12,588 base_alert.py:417 - [Alert][yarn_resourcemanager_webui] HA nameservice value is present but there are no aliases for {{yarn-site/yarn.resourcemanager.ha.rm-ids}}