WARNING 2017-05-09 10:07:21,647 base_alert.py:140 - [Alert][datanode_storage] Unable to execute alert. [Alert][datanode_storage] Unable to extract JSON from JMX response WARNING 2017-05-09 10:07:21,649 base_alert.py:417 - [Alert][yarn_resourcemanager_webui] HA nameservice value is present but there are no aliases for {{yarn-site/yarn.resourcemanager.ha.rm-ids}} INFO 2017-05-09 10:07:21,665 logger.py:67 - Pid file /var/run/ambari-metrics-monitor/ambari-metrics-monitor.pid is empty or does not exist ERROR 2017-05-09 10:07:21,665 script_alert.py:112 - [Alert][ams_metrics_monitor_process] Failed with result CRITICAL: ['Ambari Monitor is NOT running on xxx.yyy.ZZZ'] ERROR 2017-05-09 10:07:21,668 script_alert.py:112 - [Alert][ambari_agent_disk_usage] Failed with result CRITICAL: ['Capacity Used: [87.14%, 41.3 GB], Capacity Total: [47.4 GB], path=/usr/hdp'] INFO 2017-05-09 10:07:30,126 Heartbeat.py:78 - Building Heartbeat: {responseId = 2, timestamp = 1494349650126, commandsInProgress = False, componentsMapped = True} INFO 2017-05-09 10:07:30,171 Controller.py:268 - Heartbeat response received (id = 3) INFO 2017-05-09 10:07:40,172 Heartbeat.py:78 - Building Heartbeat: {responseId = 3, timestamp = 1494349660172, commandsInProgress = False, componentsMapped = True} INFO 2017-05-09 10:07:40,179 Controller.py:268 - Heartbeat response received (id = 4) INFO 2017-05-09 10:07:50,180 Heartbeat.py:78 - Building Heartbeat: {responseId = 4, timestamp = 1494349670179, commandsInProgress = False, componentsMapped = True} INFO 2017-05-09 10:07:50,225 Controller.py:268 - Heartbeat response received (id = 5) INFO 2017-05-09 10:08:00,226 Heartbeat.py:78 - Building Heartbeat: {responseId = 5, timestamp = 1494349680225, commandsInProgress = False, componentsMapped = True} INFO 2017-05-09 10:08:00,276 Controller.py:268 - Heartbeat response received (id = 6) INFO 2017-05-09 10:08:10,279 Heartbeat.py:78 - Building Heartbeat: {responseId = 6, timestamp = 1494349690279, commandsInProgress = False, componentsMapped = True} ERROR 2017-05-09 10:08:10,293 HostInfo.py:229 - Checking java processes failed Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/ambari_agent/HostInfo.py", line 212, in javaProcs cmd = open(os.path.join('/proc', pid, 'cmdline'), 'rb').read() IOError: [Errno 2] No such file or directory: '/proc/30695/cmdline' INFO 2017-05-09 10:08:10,520 logger.py:67 - call['test -w /'] {'sudo': True, 'timeout': 5} INFO 2017-05-09 10:08:10,583 logger.py:67 - call returned (0, '') INFO 2017-05-09 10:08:10,584 logger.py:67 - call['test -w /dev/shm'] {'sudo': True, 'timeout': 5} INFO 2017-05-09 10:08:10,645 logger.py:67 - call returned (0, '') INFO 2017-05-09 10:08:10,645 logger.py:67 - call['test -w /boot'] {'sudo': True, 'timeout': 5} INFO 2017-05-09 10:08:10,706 logger.py:67 - call returned (0, '') INFO 2017-05-09 10:08:10,707 logger.py:67 - call['test -w /data2'] {'sudo': True, 'timeout': 5} INFO 2017-05-09 10:08:10,760 logger.py:67 - call returned (0, '') INFO 2017-05-09 10:08:10,760 logger.py:67 - call['test -w /data3'] {'sudo': True, 'timeout': 5} INFO 2017-05-09 10:08:10,806 logger.py:67 - call returned (0, '') INFO 2017-05-09 10:08:10,915 Controller.py:268 - Heartbeat response received (id = 7) INFO 2017-05-09 10:08:10,916 ActionQueue.py:99 - Adding STATUS_COMMAND for service AMBARI_METRICS of cluster XXXXX_Cluster_Service to the queue. INFO 2017-05-09 10:08:11,035 ActionQueue.py:99 - Adding STATUS_COMMAND for service AMBARI_METRICS of cluster XXXXX_Cluster_Service to the queue. INFO 2017-05-09 10:08:11,145 ActionQueue.py:99 - Adding STATUS_COMMAND for service HDFS of cluster XXXXX_Cluster_Service to the queue. INFO 2017-05-09 10:08:11,267 ActionQueue.py:99 - Adding STATUS_COMMAND for service HDFS of cluster XXXXX_Cluster_Service to the queue. INFO 2017-05-09 10:08:11,403 ActionQueue.py:99 - Adding STATUS_COMMAND for service HDFS of cluster XXXXX_Cluster_Service to the queue. INFO 2017-05-09 10:08:11,537 ActionQueue.py:99 - Adding STATUS_COMMAND for service HDFS of cluster XXXXX_Cluster_Service to the queue. INFO 2017-05-09 10:08:11,656 ActionQueue.py:99 - Adding STATUS_COMMAND for service MAPREDUCE2 of cluster XXXXX_Cluster_Service to the queue. INFO 2017-05-09 10:08:11,772 ActionQueue.py:99 - Adding STATUS_COMMAND for service MAPREDUCE2 of cluster XXXXX_Cluster_Service to the queue. INFO 2017-05-09 10:08:11,884 ActionQueue.py:99 - Adding STATUS_COMMAND for service TEZ of cluster XXXXX_Cluster_Service to the queue. INFO 2017-05-09 10:08:12,016 ActionQueue.py:99 - Adding STATUS_COMMAND for service YARN of cluster XXXXX_Cluster_Service to the queue. INFO 2017-05-09 10:08:12,150 ActionQueue.py:99 - Adding STATUS_COMMAND for service YARN of cluster XXXXX_Cluster_Service to the queue. INFO 2017-05-09 10:08:12,277 ActionQueue.py:99 - Adding STATUS_COMMAND for service YARN of cluster XXXXX_Cluster_Service to the queue. INFO 2017-05-09 10:08:12,428 ActionQueue.py:99 - Adding STATUS_COMMAND for service YARN of cluster XXXXX_Cluster_Service to the queue. WARNING 2017-05-09 10:08:21,561 base_alert.py:140 - [Alert][mapreduce_history_server_rpc_latency] Unable to execute alert. [Alert][mapreduce_history_server_rpc_latency] Unable to extract JSON from JMX response WARNING 2017-05-09 10:08:21,566 base_alert.py:140 - [Alert][mapreduce_history_server_cpu] Unable to execute alert. [Alert][mapreduce_history_server_cpu] Unable to extract JSON from JMX response WARNING 2017-05-09 10:08:21,573 base_alert.py:417 - [Alert][namenode_cpu] HA nameservice value is present but there are no aliases for {{hdfs-site/dfs.ha.namenodes.{{ha-nameservice}}}} WARNING 2017-05-09 10:08:21,575 base_alert.py:140 - [Alert][namenode_cpu] Unable to execute alert. [Alert][namenode_cpu] Unable to extract JSON from JMX response WARNING 2017-05-09 10:08:21,597 base_alert.py:417 - [Alert][datanode_health_summary] HA nameservice value is present but there are no aliases for {{hdfs-site/dfs.ha.namenodes.{{ha-nameservice}}}} WARNING 2017-05-09 10:08:21,599 base_alert.py:140 - [Alert][datanode_health_summary] Unable to execute alert. [Alert][datanode_health_summary] Unable to extract JSON from JMX response WARNING 2017-05-09 10:08:21,601 base_alert.py:417 - [Alert][namenode_directory_status] HA nameservice value is present but there are no aliases for {{hdfs-site/dfs.ha.namenodes.{{ha-nameservice}}}} WARNING 2017-05-09 10:08:21,611 base_alert.py:140 - [Alert][namenode_directory_status] Unable to execute alert. [Alert][namenode_directory_status] Unable to extract JSON from JMX response WARNING 2017-05-09 10:08:21,624 base_alert.py:417 - [Alert][namenode_webui] HA nameservice value is present but there are no aliases for {{hdfs-site/dfs.ha.namenodes.{{ha-nameservice}}}} ERROR 2017-05-09 10:08:21,636 script_alert.py:112 - [Alert][yarn_nodemanager_health] Failed with result CRITICAL: ['Connection failed to http://xxx.yyy.ZZZ:8042/ws/v1/node/info (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 165, in execute\n url_response = urllib2.urlopen(query, timeout=connection_timeout)\n File "/usr/lib64/python2.6/urllib2.py", line 126, in urlopen\n return _opener.open(url, data, timeout)\n File "/usr/lib64/python2.6/urllib2.py", line 391, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.6/urllib2.py", line 409, in _open\n \'_open\', req)\n File "/usr/lib64/python2.6/urllib2.py", line 369, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.6/urllib2.py", line 1190, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.6/urllib2.py", line 1165, in do_open\n raise URLError(err)\nURLError: \n)'] WARNING 2017-05-09 10:08:21,642 base_alert.py:417 - [Alert][yarn_resourcemanager_webui] HA nameservice value is present but there are no aliases for {{yarn-site/yarn.resourcemanager.ha.rm-ids}} WARNING 2017-05-09 10:08:21,650 base_alert.py:417 - [Alert][yarn_resourcemanager_cpu] HA nameservice value is present but there are no aliases for {{yarn-site/yarn.resourcemanager.ha.rm-ids}} WARNING 2017-05-09 10:08:21,653 base_alert.py:140 - [Alert][yarn_resourcemanager_cpu] Unable to execute alert. [Alert][yarn_resourcemanager_cpu] Unable to extract JSON from JMX response WARNING 2017-05-09 10:08:21,655 base_alert.py:417 - [Alert][yarn_resourcemanager_rpc_latency] HA nameservice value is present but there are no aliases for {{yarn-site/yarn.resourcemanager.ha.rm-ids}} WARNING 2017-05-09 10:08:21,656 base_alert.py:140 - [Alert][yarn_resourcemanager_rpc_latency] Unable to execute alert. [Alert][yarn_resourcemanager_rpc_latency] Unable to extract JSON from JMX response WARNING 2017-05-09 10:08:21,667 base_alert.py:140 - [Alert][ams_metrics_collector_hbase_master_cpu] Unable to execute alert. [Alert][ams_metrics_collector_hbase_master_cpu] Unable to extract JSON from JMX response ERROR 2017-05-09 10:08:21,678 script_alert.py:112 - [Alert][ambari_agent_disk_usage] Failed with result CRITICAL: ['Capacity Used: [87.14%, 41.3 GB], Capacity Total: [47.4 GB], path=/usr/hdp'] INFO 2017-05-09 10:08:21,681 logger.py:67 - Pid file /var/run/ambari-metrics-monitor/ambari-metrics-monitor.pid is empty or does not exist ERROR 2017-05-09 10:08:21,682 script_alert.py:112 - [Alert][ams_metrics_monitor_process] Failed with result CRITICAL: ['Ambari Monitor is NOT running on xxx.yyy.ZZZ'] INFO 2017-05-09 10:08:23,008 Heartbeat.py:78 - Building Heartbeat: {responseId = 7, timestamp = 1494349703008, commandsInProgress = False, componentsMapped = True} INFO 2017-05-09 10:08:23,060 Controller.py:268 - Heartbeat response received (id = 8) INFO 2017-05-09 10:08:33,061 Heartbeat.py:78 - Building Heartbeat: {responseId = 8, timestamp = 1494349713061, commandsInProgress = False, componentsMapped = True} INFO 2017-05-09 10:08:33,106 Controller.py:268 - Heartbeat response received (id = 9) INFO 2017-05-09 10:08:43,107 Heartbeat.py:78 - Building Heartbeat: {responseId = 9, timestamp = 1494349723107, commandsInProgress = False, componentsMapped = True} INFO 2017-05-09 10:08:43,154 Controller.py:268 - Heartbeat response received (id = 10) INFO 2017-05-09 10:08:53,155 Heartbeat.py:78 - Building Heartbeat: {responseId = 10, timestamp = 1494349733154, commandsInProgress = False, componentsMapped = True} INFO 2017-05-09 10:08:53,200 Controller.py:268 - Heartbeat response received (id = 11) INFO 2017-05-09 10:09:03,201 Heartbeat.py:78 - Building Heartbeat: {responseId = 11, timestamp = 1494349743201, commandsInProgress = False, componentsMapped = True} INFO 2017-05-09 10:09:03,247 Controller.py:268 - Heartbeat response received (id = 12) INFO 2017-05-09 10:09:13,249 Heartbeat.py:78 - Building Heartbeat: {responseId = 12, timestamp = 1494349753249, commandsInProgress = False, componentsMapped = True} ERROR 2017-05-09 10:09:13,259 HostInfo.py:229 - Checking java processes failed Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/ambari_agent/HostInfo.py", line 212, in javaProcs cmd = open(os.path.join('/proc', pid, 'cmdline'), 'rb').read() IOError: [Errno 2] No such file or directory: '/proc/3413/cmdline' INFO 2017-05-09 10:09:13,519 logger.py:67 - call['test -w /'] {'sudo': True, 'timeout': 5} INFO 2017-05-09 10:09:13,601 logger.py:67 - call returned (0, '') INFO 2017-05-09 10:09:13,602 logger.py:67 - call['test -w /dev/shm'] {'sudo': True, 'timeout': 5} INFO 2017-05-09 10:09:13,650 logger.py:67 - call returned (0, '') INFO 2017-05-09 10:09:13,651 logger.py:67 - call['test -w /boot'] {'sudo': True, 'timeout': 5} INFO 2017-05-09 10:09:13,736 logger.py:67 - call returned (0, '') INFO 2017-05-09 10:09:13,737 logger.py:67 - call['test -w /data2'] {'sudo': True, 'timeout': 5} INFO 2017-05-09 10:09:13,819 logger.py:67 - call returned (0, '') INFO 2017-05-09 10:09:13,820 logger.py:67 - call['test -w /data3'] {'sudo': True, 'timeout': 5} INFO 2017-05-09 10:09:13,894 logger.py:67 - call returned (0, '') INFO 2017-05-09 10:09:13,991 Controller.py:268 - Heartbeat response received (id = 13) INFO 2017-05-09 10:09:13,991 ActionQueue.py:99 - Adding STATUS_COMMAND for service AMBARI_METRICS of cluster XXXXX_Cluster_Service to the queue. INFO 2017-05-09 10:09:14,121 ActionQueue.py:99 - Adding STATUS_COMMAND for service AMBARI_METRICS of cluster XXXXX_Cluster_Service to the queue. INFO 2017-05-09 10:09:14,257 ActionQueue.py:99 - Adding STATUS_COMMAND for service HDFS of cluster XXXXX_Cluster_Service to the queue. INFO 2017-05-09 10:09:14,385 ActionQueue.py:99 - Adding STATUS_COMMAND for service HDFS of cluster XXXXX_Cluster_Service to the queue. INFO 2017-05-09 10:09:14,581 ActionQueue.py:99 - Adding STATUS_COMMAND for service HDFS of cluster XXXXX_Cluster_Service to the queue. INFO 2017-05-09 10:09:14,750 ActionQueue.py:99 - Adding STATUS_COMMAND for service HDFS of cluster XXXXX_Cluster_Service to the queue. INFO 2017-05-09 10:09:14,919 ActionQueue.py:99 - Adding STATUS_COMMAND for service MAPREDUCE2 of cluster XXXXX_Cluster_Service to the queue. INFO 2017-05-09 10:09:15,076 ActionQueue.py:99 - Adding STATUS_COMMAND for service MAPREDUCE2 of cluster XXXXX_Cluster_Service to the queue. INFO 2017-05-09 10:09:15,238 ActionQueue.py:99 - Adding STATUS_COMMAND for service TEZ of cluster XXXXX_Cluster_Service to the queue. INFO 2017-05-09 10:09:15,399 ActionQueue.py:99 - Adding STATUS_COMMAND for service YARN of cluster XXXXX_Cluster_Service to the queue. INFO 2017-05-09 10:09:15,802 ActionQueue.py:99 - Adding STATUS_COMMAND for service YARN of cluster XXXXX_Cluster_Service to the queue. WARNING 2017-05-09 10:09:21,596 base_alert.py:417 - [Alert][namenode_hdfs_pending_deletion_blocks] HA nameservice value is present but there are no aliases for {{hdfs-site/dfs.ha.namenodes.{{ha-nameservice}}}} WARNING 2017-05-09 10:09:21,598 base_alert.py:140 - [Alert][namenode_hdfs_pending_deletion_blocks] Unable to execute alert. [Alert][namenode_hdfs_pending_deletion_blocks] Unable to extract JSON from JMX response WARNING 2017-05-09 10:09:21,613 base_alert.py:417 - [Alert][datanode_health_summary] HA nameservice value is present but there are no aliases for {{hdfs-site/dfs.ha.namenodes.{{ha-nameservice}}}} WARNING 2017-05-09 10:09:21,615 base_alert.py:140 - [Alert][datanode_health_summary] Unable to execute alert. [Alert][datanode_health_summary] Unable to extract JSON from JMX response ERROR 2017-05-09 10:09:21,620 script_alert.py:112 - [Alert][datanode_unmounted_data_dir] Failed with result CRITICAL: ['Data dir(s) not found: /hadoop/hdfs/data .'] WARNING 2017-05-09 10:09:21,628 base_alert.py:417 - [Alert][namenode_directory_status] HA nameservice value is present but there are no aliases for {{hdfs-site/dfs.ha.namenodes.{{ha-nameservice}}}} WARNING 2017-05-09 10:09:21,632 base_alert.py:140 - [Alert][namenode_directory_status] Unable to execute alert. [Alert][namenode_directory_status] Unable to extract JSON from JMX response WARNING 2017-05-09 10:09:21,633 base_alert.py:417 - [Alert][namenode_hdfs_blocks_health] HA nameservice value is present but there are no aliases for {{hdfs-site/dfs.ha.namenodes.{{ha-nameservice}}}} WARNING 2017-05-09 10:09:21,634 base_alert.py:140 - [Alert][namenode_hdfs_blocks_health] Unable to execute alert. [Alert][namenode_hdfs_blocks_health] Unable to extract JSON from JMX response WARNING 2017-05-09 10:09:21,650 base_alert.py:417 - [Alert][namenode_hdfs_capacity_utilization] HA nameservice value is present but there are no aliases for {{hdfs-site/dfs.ha.namenodes.{{ha-nameservice}}}} WARNING 2017-05-09 10:09:21,653 base_alert.py:140 - [Alert][namenode_hdfs_capacity_utilization] Unable to execute alert. [Alert][namenode_hdfs_capacity_utilization] Unable to extract JSON from JMX response WARNING 2017-05-09 10:09:21,655 base_alert.py:417 - [Alert][namenode_rpc_latency] HA nameservice value is present but there are no aliases for {{hdfs-site/dfs.ha.namenodes.{{ha-nameservice}}}} WARNING 2017-05-09 10:09:21,661 base_alert.py:140 - [Alert][namenode_rpc_latency] Unable to execute alert. [Alert][namenode_rpc_latency] Unable to extract JSON from JMX response WARNING 2017-05-09 10:09:21,665 base_alert.py:417 - [Alert][namenode_webui] HA nameservice value is present but there are no aliases for {{hdfs-site/dfs.ha.namenodes.{{ha-nameservice}}}} WARNING 2017-05-09 10:09:21,674 base_alert.py:140 - [Alert][datanode_storage] Unable to execute alert. [Alert][datanode_storage] Unable to extract JSON from JMX response WARNING 2017-05-09 10:09:21,674 base_alert.py:417 - [Alert][yarn_resourcemanager_webui] HA nameservice value is present but there are no aliases for {{yarn-site/yarn.resourcemanager.ha.rm-ids}} INFO 2017-05-09 10:09:21,700 logger.py:67 - Pid file /var/run/ambari-metrics-monitor/ambari-metrics-monitor.pid is empty or does not exist ERROR 2017-05-09 10:09:21,700 script_alert.py:112 - [Alert][ams_metrics_monitor_process] Failed with result CRITICAL: ['Ambari Monitor is NOT running on xxx.yyy.ZZZ'] ERROR 2017-05-09 10:09:21,701 script_alert.py:112 - [Alert][yarn_nodemanager_health] Failed with result CRITICAL: ['Connection failed to http://xxx.yyy.ZZZ:8042/ws/v1/node/info (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 165, in execute\n url_response = urllib2.urlopen(query, timeout=connection_timeout)\n File "/usr/lib64/python2.6/urllib2.py", line 126, in urlopen\n return _opener.open(url, data, timeout)\n File "/usr/lib64/python2.6/urllib2.py", line 391, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.6/urllib2.py", line 409, in _open\n \'_open\', req)\n File "/usr/lib64/python2.6/urllib2.py", line 369, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.6/urllib2.py", line 1190, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.6/urllib2.py", line 1165, in do_open\n raise URLError(err)\nURLError: \n)'] ERROR 2017-05-09 10:09:21,702 script_alert.py:112 - [Alert][ambari_agent_disk_usage] Failed with result CRITICAL: ['Capacity Used: [87.15%, 41.3 GB], Capacity Total: [47.4 GB], path=/usr/hdp'] INFO 2017-05-09 10:09:26,831 Heartbeat.py:78 - Building Heartbeat: {responseId = 13, timestamp = 1494349766831, commandsInProgress = False, componentsMapped = True} INFO 2017-05-09 10:09:26,898 Controller.py:268 - Heartbeat response received (id = 14) INFO 2017-05-09 10:09:36,899 Heartbeat.py:78 - Building Heartbeat: {responseId = 14, timestamp = 1494349776899, commandsInProgress = False, componentsMapped = True} INFO 2017-05-09 10:09:36,948 Controller.py:268 - Heartbeat response received (id = 15) INFO 2017-05-09 10:09:46,948 Heartbeat.py:78 - Building Heartbeat: {responseId = 15, timestamp = 1494349786948, commandsInProgress = False, componentsMapped = True} INFO 2017-05-09 10:09:46,992 Controller.py:268 - Heartbeat response received (id = 16) INFO 2017-05-09 10:09:56,993 Heartbeat.py:78 - Building Heartbeat: {responseId = 16, timestamp = 1494349796993, commandsInProgress = False, componentsMapped = True} INFO 2017-05-09 10:09:57,038 Controller.py:268 - Heartbeat response received (id = 17) INFO 2017-05-09 10:10:07,039 Heartbeat.py:78 - Building Heartbeat: {responseId = 17, timestamp = 1494349807039, commandsInProgress = False, componentsMapped = True} INFO 2017-05-09 10:10:07,136 Controller.py:268 - Heartbeat response received (id = 18) INFO 2017-05-09 10:10:07,137 ActionQueue.py:99 - Adding STATUS_COMMAND for service AMBARI_METRICS of cluster XXXXX_Cluster_Service to the queue. INFO 2017-05-09 10:10:07,272 ActionQueue.py:99 - Adding STATUS_COMMAND for service AMBARI_METRICS of cluster XXXXX_Cluster_Service to the queue. INFO 2017-05-09 10:10:07,398 ActionQueue.py:99 - Adding STATUS_COMMAND for service HDFS of cluster XXXXX_Cluster_Service to the queue. INFO 2017-05-09 10:10:07,540 ActionQueue.py:99 - Adding STATUS_COMMAND for service HDFS of cluster XXXXX_Cluster_Service to the queue. INFO 2017-05-09 10:10:07,672 ActionQueue.py:99 - Adding STATUS_COMMAND for service HDFS of cluster XXXXX_Cluster_Service to the queue. INFO 2017-05-09 10:10:07,803 ActionQueue.py:99 - Adding STATUS_COMMAND for service HDFS of cluster XXXXX_Cluster_Service to the queue. INFO 2017-05-09 10:10:07,969 ActionQueue.py:99 - Adding STATUS_COMMAND for service MAPREDUCE2 of cluster XXXXX_Cluster_Service to the queue. INFO 2017-05-09 10:10:08,118 ActionQueue.py:99 - Adding STATUS_COMMAND for service MAPREDUCE2 of cluster XXXXX_Cluster_Service to the queue. INFO 2017-05-09 10:10:08,240 ActionQueue.py:99 - Adding STATUS_COMMAND for service TEZ of cluster XXXXX_Cluster_Service to the queue. INFO 2017-05-09 10:10:08,370 ActionQueue.py:99 - Adding STATUS_COMMAND for service YARN of cluster XXXXX_Cluster_Service to the queue. INFO 2017-05-09 10:10:08,514 ActionQueue.py:99 - Adding STATUS_COMMAND for service YARN of cluster XXXXX_Cluster_Service to the queue. INFO 2017-05-09 10:10:08,645 ActionQueue.py:99 - Adding STATUS_COMMAND for service YARN of cluster XXXXX_Cluster_Service to the queue. INFO 2017-05-09 10:10:08,758 ActionQueue.py:99 - Adding STATUS_COMMAND for service YARN of cluster XXXXX_Cluster_Service to the queue. INFO 2017-05-09 10:10:08,901 ActionQueue.py:99 - Adding STATUS_COMMAND for service ZOOKEEPER of cluster XXXXX_Cluster_Service to the queue. INFO 2017-05-09 10:10:19,290 Heartbeat.py:78 - Building Heartbeat: {responseId = 18, timestamp = 1494349819290, commandsInProgress = False, componentsMapped = True} ERROR 2017-05-09 10:10:19,299 HostInfo.py:229 - Checking java processes failed Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/ambari_agent/HostInfo.py", line 212, in javaProcs cmd = open(os.path.join('/proc', pid, 'cmdline'), 'rb').read() IOError: [Errno 2] No such file or directory: '/proc/11864/cmdline' INFO 2017-05-09 10:10:19,556 logger.py:67 - call['test -w /'] {'sudo': True, 'timeout': 5} INFO 2017-05-09 10:10:19,606 logger.py:67 - call returned (0, '') INFO 2017-05-09 10:10:19,607 logger.py:67 - call['test -w /dev/shm'] {'sudo': True, 'timeout': 5} INFO 2017-05-09 10:10:19,657 logger.py:67 - call returned (0, '') INFO 2017-05-09 10:10:19,658 logger.py:67 - call['test -w /boot'] {'sudo': True, 'timeout': 5} INFO 2017-05-09 10:10:19,718 logger.py:67 - call returned (0, '') INFO 2017-05-09 10:10:19,718 logger.py:67 - call['test -w /data2'] {'sudo': True, 'timeout': 5} INFO 2017-05-09 10:10:19,787 logger.py:67 - call returned (0, '') INFO 2017-05-09 10:10:19,787 logger.py:67 - call['test -w /data3'] {'sudo': True, 'timeout': 5} INFO 2017-05-09 10:10:19,851 logger.py:67 - call returned (0, '') INFO 2017-05-09 10:10:19,871 Controller.py:268 - Heartbeat response received (id = 19) WARNING 2017-05-09 10:10:21,585 base_alert.py:417 - [Alert][datanode_health_summary] HA nameservice value is present but there are no aliases for {{hdfs-site/dfs.ha.namenodes.{{ha-nameservice}}}} WARNING 2017-05-09 10:10:21,592 base_alert.py:140 - [Alert][datanode_health_summary] Unable to execute alert. [Alert][datanode_health_summary] Unable to extract JSON from JMX response WARNING 2017-05-09 10:10:21,593 base_alert.py:417 - [Alert][namenode_directory_status] HA nameservice value is present but there are no aliases for {{hdfs-site/dfs.ha.namenodes.{{ha-nameservice}}}} WARNING 2017-05-09 10:10:21,594 base_alert.py:140 - [Alert][namenode_directory_status] Unable to execute alert. [Alert][namenode_directory_status] Unable to extract JSON from JMX response WARNING 2017-05-09 10:10:21,627 base_alert.py:417 - [Alert][namenode_webui] HA nameservice value is present but there are no aliases for {{hdfs-site/dfs.ha.namenodes.{{ha-nameservice}}}} ERROR 2017-05-09 10:10:21,643 script_alert.py:112 - [Alert][yarn_nodemanager_health] Failed with result CRITICAL: ['Connection failed to http://xxx.yyy.ZZZ:8042/ws/v1/node/info (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 165, in execute\n url_response = urllib2.urlopen(query, timeout=connection_timeout)\n File "/usr/lib64/python2.6/urllib2.py", line 126, in urlopen\n return _opener.open(url, data, timeout)\n File "/usr/lib64/python2.6/urllib2.py", line 391, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.6/urllib2.py", line 409, in _open\n \'_open\', req)\n File "/usr/lib64/python2.6/urllib2.py", line 369, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.6/urllib2.py", line 1190, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.6/urllib2.py", line 1165, in do_open\n raise URLError(err)\nURLError: \n)'] WARNING 2017-05-09 10:10:21,643 base_alert.py:417 - [Alert][yarn_resourcemanager_webui] HA nameservice value is present but there are no aliases for {{yarn-site/yarn.resourcemanager.ha.rm-ids}} INFO 2017-05-09 10:10:21,666 logger.py:67 - Pid file /var/run/ambari-metrics-monitor/ambari-metrics-monitor.pid is empty or does not exist ERROR 2017-05-09 10:10:21,667 script_alert.py:112 - [Alert][ams_metrics_monitor_process] Failed with result CRITICAL: ['Ambari Monitor is NOT running on xxx.yyy.ZZZ'] ERROR 2017-05-09 10:10:21,674 script_alert.py:112 - [Alert][ambari_agent_disk_usage] Failed with result CRITICAL: ['Capacity Used: [87.16%, 41.3 GB], Capacity Total: [47.4 GB], path=/usr/hdp'] INFO 2017-05-09 10:10:29,872 Heartbeat.py:78 - Building Heartbeat: {responseId = 19, timestamp = 1494349829871, commandsInProgress = False, componentsMapped = True} INFO 2017-05-09 10:10:29,917 Controller.py:268 - Heartbeat response received (id = 20) INFO 2017-05-09 10:10:39,918 Heartbeat.py:78 - Building Heartbeat: {responseId = 20, timestamp = 1494349839918, commandsInProgress = False, componentsMapped = True} INFO 2017-05-09 10:10:39,963 Controller.py:268 - Heartbeat response received (id = 21) INFO 2017-05-09 10:10:49,964 Heartbeat.py:78 - Building Heartbeat: {responseId = 21, timestamp = 1494349849963, commandsInProgress = False, componentsMapped = True} INFO 2017-05-09 10:10:50,009 Controller.py:268 - Heartbeat response received (id = 22) INFO 2017-05-09 10:11:00,010 Heartbeat.py:78 - Building Heartbeat: {responseId = 22, timestamp = 1494349860009, commandsInProgress = False, componentsMapped = True} INFO 2017-05-09 10:11:00,056 Controller.py:268 - Heartbeat response received (id = 23) INFO 2017-05-09 10:11:10,056 Heartbeat.py:78 - Building Heartbeat: {responseId = 23, timestamp = 1494349870056, commandsInProgress = False, componentsMapped = True} INFO 2017-05-09 10:11:10,188 Controller.py:268 - Heartbeat response received (id = 24) INFO 2017-05-09 10:11:10,196 ActionQueue.py:99 - Adding STATUS_COMMAND for service AMBARI_METRICS of cluster XXXXX_Cluster_Service to the queue. INFO 2017-05-09 10:11:10,346 ActionQueue.py:99 - Adding STATUS_COMMAND for service AMBARI_METRICS of cluster XXXXX_Cluster_Service to the queue. INFO 2017-05-09 10:11:10,533 ActionQueue.py:99 - Adding STATUS_COMMAND for service HDFS of cluster XXXXX_Cluster_Service to the queue. INFO 2017-05-09 10:11:10,705 ActionQueue.py:99 - Adding STATUS_COMMAND for service HDFS of cluster XXXXX_Cluster_Service to the queue. INFO 2017-05-09 10:11:10,866 ActionQueue.py:99 - Adding STATUS_COMMAND for service HDFS of cluster XXXXX_Cluster_Service to the queue. INFO 2017-05-09 10:11:11,045 ActionQueue.py:99 - Adding STATUS_COMMAND for service HDFS of cluster XXXXX_Cluster_Service to the queue. INFO 2017-05-09 10:11:11,223 ActionQueue.py:99 - Adding STATUS_COMMAND for service MAPREDUCE2 of cluster XXXXX_Cluster_Service to the queue. INFO 2017-05-09 10:11:11,372 ActionQueue.py:99 - Adding STATUS_COMMAND for service MAPREDUCE2 of cluster XXXXX_Cluster_Service to the queue. INFO 2017-05-09 10:11:11,533 ActionQueue.py:99 - Adding STATUS_COMMAND for service TEZ of cluster XXXXX_Cluster_Service to the queue. INFO 2017-05-09 10:11:11,690 ActionQueue.py:99 - Adding STATUS_COMMAND for service YARN of cluster XXXXX_Cluster_Service to the queue. INFO 2017-05-09 10:11:11,865 ActionQueue.py:99 - Adding STATUS_COMMAND for service YARN of cluster XXXXX_Cluster_Service to the queue. INFO 2017-05-09 10:11:12,116 ActionQueue.py:99 - Adding STATUS_COMMAND for service YARN of cluster XXXXX_Cluster_Service to the queue. WARNING 2017-05-09 10:11:21,610 base_alert.py:417 - [Alert][namenode_hdfs_pending_deletion_blocks] HA nameservice value is present but there are no aliases for {{hdfs-site/dfs.ha.namenodes.{{ha-nameservice}}}} WARNING 2017-05-09 10:11:21,612 base_alert.py:140 - [Alert][namenode_hdfs_pending_deletion_blocks] Unable to execute alert. [Alert][namenode_hdfs_pending_deletion_blocks] Unable to extract JSON from JMX response WARNING 2017-05-09 10:11:21,628 base_alert.py:417 - [Alert][datanode_health_summary] HA nameservice value is present but there are no aliases for {{hdfs-site/dfs.ha.namenodes.{{ha-nameservice}}}} WARNING 2017-05-09 10:11:21,630 base_alert.py:140 - [Alert][datanode_health_summary] Unable to execute alert. [Alert][datanode_health_summary] Unable to extract JSON from JMX response ERROR 2017-05-09 10:11:21,639 script_alert.py:112 - [Alert][datanode_unmounted_data_dir] Failed with result CRITICAL: ['Data dir(s) not found: /hadoop/hdfs/data .'] WARNING 2017-05-09 10:11:21,641 base_alert.py:417 - [Alert][namenode_directory_status] HA nameservice value is present but there are no aliases for {{hdfs-site/dfs.ha.namenodes.{{ha-nameservice}}}} WARNING 2017-05-09 10:11:21,643 base_alert.py:140 - [Alert][namenode_directory_status] Unable to execute alert. [Alert][namenode_directory_status] Unable to extract JSON from JMX response WARNING 2017-05-09 10:11:21,656 base_alert.py:417 - [Alert][namenode_hdfs_blocks_health] HA nameservice value is present but there are no aliases for {{hdfs-site/dfs.ha.namenodes.{{ha-nameservice}}}} WARNING 2017-05-09 10:11:21,657 base_alert.py:140 - [Alert][namenode_hdfs_blocks_health] Unable to execute alert. [Alert][namenode_hdfs_blocks_health] Unable to extract JSON from JMX response WARNING 2017-05-09 10:11:21,657 base_alert.py:417 - [Alert][namenode_hdfs_capacity_utilization] HA nameservice value is present but there are no aliases for {{hdfs-site/dfs.ha.namenodes.{{ha-nameservice}}}} WARNING 2017-05-09 10:11:21,666 base_alert.py:140 - [Alert][namenode_hdfs_capacity_utilization] Unable to execute alert. [Alert][namenode_hdfs_capacity_utilization] Unable to extract JSON from JMX response WARNING 2017-05-09 10:11:21,667 base_alert.py:417 - [Alert][namenode_rpc_latency] HA nameservice value is present but there are no aliases for {{hdfs-site/dfs.ha.namenodes.{{ha-nameservice}}}} WARNING 2017-05-09 10:11:21,672 base_alert.py:140 - [Alert][namenode_rpc_latency] Unable to execute alert. [Alert][namenode_rpc_latency] Unable to extract JSON from JMX response WARNING 2017-05-09 10:11:21,679 base_alert.py:417 - [Alert][namenode_webui] HA nameservice value is present but there are no aliases for {{hdfs-site/dfs.ha.namenodes.{{ha-nameservice}}}} WARNING 2017-05-09 10:11:21,696 base_alert.py:140 - [Alert][datanode_storage] Unable to execute alert. [Alert][datanode_storage] Unable to extract JSON from JMX response WARNING 2017-05-09 10:11:21,698 base_alert.py:417 - [Alert][yarn_resourcemanager_webui] HA nameservice value is present but there are no aliases for {{yarn-site/yarn.resourcemanager.ha.rm-ids}} ERROR 2017-05-09 10:11:21,716 script_alert.py:112 - [Alert][yarn_nodemanager_health] Failed with result CRITICAL: ['Connection failed to http://xxx.yyy.ZZZ:8042/ws/v1/node/info (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 165, in execute\n url_response = urllib2.urlopen(query, timeout=connection_timeout)\n File "/usr/lib64/python2.6/urllib2.py", line 126, in urlopen\n return _opener.open(url, data, timeout)\n File "/usr/lib64/python2.6/urllib2.py", line 391, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.6/urllib2.py", line 409, in _open\n \'_open\', req)\n File "/usr/lib64/python2.6/urllib2.py", line 369, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.6/urllib2.py", line 1190, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.6/urllib2.py", line 1165, in do_open\n raise URLError(err)\nURLError: \n)'] INFO 2017-05-09 10:11:21,723 logger.py:67 - Pid file /var/run/ambari-metrics-monitor/ambari-metrics-monitor.pid is empty or does not exist ERROR 2017-05-09 10:11:21,724 script_alert.py:112 - [Alert][ams_metrics_monitor_process] Failed with result CRITICAL: ['Ambari Monitor is NOT running on xxx.yyy.ZZZ'] ERROR 2017-05-09 10:11:21,725 script_alert.py:112 - [Alert][ambari_agent_disk_usage] Failed with result CRITICAL: ['Capacity Used: [87.17%, 41.3 GB], Capacity Total: [47.4 GB], path=/usr/hdp'] INFO 2017-05-09 10:11:23,067 Heartbeat.py:78 - Building Heartbeat: {responseId = 24, timestamp = 1494349883067, commandsInProgress = False, componentsMapped = True} ERROR 2017-05-09 10:11:23,078 HostInfo.py:229 - Checking java processes failed Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/ambari_agent/HostInfo.py", line 212, in javaProcs cmd = open(os.path.join('/proc', pid, 'cmdline'), 'rb').read() IOError: [Errno 2] No such file or directory: '/proc/17666/cmdline' INFO 2017-05-09 10:11:23,367 logger.py:67 - call['test -w /'] {'sudo': True, 'timeout': 5} INFO 2017-05-09 10:11:23,443 logger.py:67 - call returned (0, '') INFO 2017-05-09 10:11:23,444 logger.py:67 - call['test -w /dev/shm'] {'sudo': True, 'timeout': 5} INFO 2017-05-09 10:11:23,521 logger.py:67 - call returned (0, '') INFO 2017-05-09 10:11:23,521 logger.py:67 - call['test -w /boot'] {'sudo': True, 'timeout': 5} INFO 2017-05-09 10:11:23,601 logger.py:67 - call returned (0, '') INFO 2017-05-09 10:11:23,602 logger.py:67 - call['test -w /data2'] {'sudo': True, 'timeout': 5} INFO 2017-05-09 10:11:23,664 logger.py:67 - call returned (0, '') INFO 2017-05-09 10:11:23,665 logger.py:67 - call['test -w /data3'] {'sudo': True, 'timeout': 5} INFO 2017-05-09 10:11:23,744 logger.py:67 - call returned (0, '') INFO 2017-05-09 10:11:23,766 Controller.py:268 - Heartbeat response received (id = 25) INFO 2017-05-09 10:11:33,767 Heartbeat.py:78 - Building Heartbeat: {responseId = 25, timestamp = 1494349893767, commandsInProgress = False, componentsMapped = True} INFO 2017-05-09 10:11:33,814 Controller.py:268 - Heartbeat response received (id = 26) INFO 2017-05-09 10:11:43,815 Heartbeat.py:78 - Building Heartbeat: {responseId = 26, timestamp = 1494349903815, commandsInProgress = False, componentsMapped = True} INFO 2017-05-09 10:11:43,861 Controller.py:268 - Heartbeat response received (id = 27) INFO 2017-05-09 10:11:53,862 Heartbeat.py:78 - Building Heartbeat: {responseId = 27, timestamp = 1494349913862, commandsInProgress = False, componentsMapped = True} INFO 2017-05-09 10:11:53,907 Controller.py:268 - Heartbeat response received (id = 28) INFO 2017-05-09 10:12:03,909 Heartbeat.py:78 - Building Heartbeat: {responseId = 28, timestamp = 1494349923909, commandsInProgress = False, componentsMapped = True} INFO 2017-05-09 10:12:03,956 Controller.py:268 - Heartbeat response received (id = 29) INFO 2017-05-09 10:12:13,957 Heartbeat.py:78 - Building Heartbeat: {responseId = 29, timestamp = 1494349933957, commandsInProgress = False, componentsMapped = True} INFO 2017-05-09 10:12:14,052 Controller.py:268 - Heartbeat response received (id = 30) INFO 2017-05-09 10:12:14,052 ActionQueue.py:99 - Adding STATUS_COMMAND for service AMBARI_METRICS of cluster XXXXX_Cluster_Service to the queue. INFO 2017-05-09 10:12:14,196 ActionQueue.py:99 - Adding STATUS_COMMAND for service AMBARI_METRICS of cluster XXXXX_Cluster_Service to the queue. INFO 2017-05-09 10:12:14,351 ActionQueue.py:99 - Adding STATUS_COMMAND for service HDFS of cluster XXXXX_Cluster_Service to the queue. INFO 2017-05-09 10:12:14,476 ActionQueue.py:99 - Adding STATUS_COMMAND for service HDFS of cluster XXXXX_Cluster_Service to the queue. INFO 2017-05-09 10:12:14,589 ActionQueue.py:99 - Adding STATUS_COMMAND for service HDFS of cluster XXXXX_Cluster_Service to the queue. INFO 2017-05-09 10:12:14,727 ActionQueue.py:99 - Adding STATUS_COMMAND for service HDFS of cluster XXXXX_Cluster_Service to the queue. INFO 2017-05-09 10:12:14,900 ActionQueue.py:99 - Adding STATUS_COMMAND for service MAPREDUCE2 of cluster XXXXX_Cluster_Service to the queue. INFO 2017-05-09 10:12:15,050 ActionQueue.py:99 - Adding STATUS_COMMAND for service MAPREDUCE2 of cluster XXXXX_Cluster_Service to the queue. INFO 2017-05-09 10:12:15,222 ActionQueue.py:99 - Adding STATUS_COMMAND for service TEZ of cluster XXXXX_Cluster_Service to the queue. INFO 2017-05-09 10:12:15,714 ActionQueue.py:99 - Adding STATUS_COMMAND for service YARN of cluster XXXXX_Cluster_Service to the queue. INFO 2017-05-09 10:12:16,530 ActionQueue.py:99 - Adding STATUS_COMMAND for service ZOOKEEPER of cluster XXXXX_Cluster_Service to the queue. WARNING 2017-05-09 10:12:21,697 base_alert.py:417 - [Alert][datanode_health_summary] HA nameservice value is present but there are no aliases for {{hdfs-site/dfs.ha.namenodes.{{ha-nameservice}}}} WARNING 2017-05-09 10:12:21,704 base_alert.py:140 - [Alert][datanode_health_summary] Unable to execute alert. [Alert][datanode_health_summary] Unable to extract JSON from JMX response WARNING 2017-05-09 10:12:21,705 base_alert.py:417 - [Alert][namenode_directory_status] HA nameservice value is present but there are no aliases for {{hdfs-site/dfs.ha.namenodes.{{ha-nameservice}}}} WARNING 2017-05-09 10:12:21,708 base_alert.py:140 - [Alert][namenode_directory_status] Unable to execute alert. [Alert][namenode_directory_status] Unable to extract JSON from JMX response WARNING 2017-05-09 10:12:21,730 base_alert.py:417 - [Alert][namenode_webui] HA nameservice value is present but there are no aliases for {{hdfs-site/dfs.ha.namenodes.{{ha-nameservice}}}} ERROR 2017-05-09 10:12:21,745 script_alert.py:112 - [Alert][yarn_nodemanager_health] Failed with result CRITICAL: ['Connection failed to http://xxx.yyy.ZZZ:8042/ws/v1/node/info (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 165, in execute\n url_response = urllib2.urlopen(query, timeout=connection_timeout)\n File "/usr/lib64/python2.6/urllib2.py", line 126, in urlopen\n return _opener.open(url, data, timeout)\n File "/usr/lib64/python2.6/urllib2.py", line 391, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.6/urllib2.py", line 409, in _open\n \'_open\', req)\n File "/usr/lib64/python2.6/urllib2.py", line 369, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.6/urllib2.py", line 1190, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.6/urllib2.py", line 1165, in do_open\n raise URLError(err)\nURLError: \n)'] WARNING 2017-05-09 10:12:21,746 base_alert.py:417 - [Alert][yarn_resourcemanager_webui] HA nameservice value is present but there are no aliases for {{yarn-site/yarn.resourcemanager.ha.rm-ids}} INFO 2017-05-09 10:12:21,790 logger.py:67 - Pid file /var/run/ambari-metrics-monitor/ambari-metrics-monitor.pid is empty or does not exist ERROR 2017-05-09 10:12:21,790 script_alert.py:112 - [Alert][ams_metrics_monitor_process] Failed with result CRITICAL: ['Ambari Monitor is NOT running on xxx.yyy.ZZZ'] ERROR 2017-05-09 10:12:21,794 script_alert.py:112 - [Alert][ambari_agent_disk_usage] Failed with result CRITICAL: ['Capacity Used: [87.17%, 41.3 GB], Capacity Total: [47.4 GB], path=/usr/hdp']