2017-05-17 08:55:50,565 [CRITICAL] [HARD] [HIVE] [hive_server_process] (HiveServer2 Process) Connection failed on host nshk-1.openstacklocal:10000 (Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_hive_thrift_port.py", line 200, in execute check_command_timeout=int(check_command_timeout)) File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/hive_check.py", line 74, in check_thrift_port_sasl timeout=check_command_timeout) File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 155, in __init__ self.env.run() File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run self.run_action(resource, action) File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action provider_action() File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 273, in action_run tries=self.resource.tries, try_sleep=self.resource.try_sleep) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 70, in inner result = function(command, **kwargs) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 92, in checked_call tries=tries, try_sleep=try_sleep) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 140, in _call_wrapper result = _call(command, **kwargs_copy) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 293, in _call raise ExecutionFailed(err_msg, code, out, err) ExecutionFailed: Execution of '! beeline -u 'jdbc:hive2://nshk-1.openstacklocal:10000/;transportMode=binary' -e '' 2>&1| awk '{print}'|grep -i -e 'Connection refused' -e 'Invalid URL'' returned 1. Error: Could not open client transport with JDBC Uri: jdbc:hive2://nshk-1.openstacklocal:10000/;transportMode=binary: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0) Error: Could not open client transport with JDBC Uri: jdbc:hive2://nshk-1.openstacklocal:10000/;transportMode=binary: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0) ) 2017-05-17 08:58:46,569 [OK] [HARD] [HDFS] [namenode_webui] (NameNode Web UI) HTTP 200 response in 0.000s 2017-05-17 08:58:46,569 [OK] [HARD] [HDFS] [datanode_health_summary] (DataNode Health Summary) All 0 DataNode(s) are healthy 2017-05-17 08:58:46,571 [OK] [HARD] [HDFS] [upgrade_finalized_state] (HDFS Upgrade Finalized State) HDFS cluster is not in the upgrade state 2017-05-17 08:58:46,571 [OK] [HARD] [HDFS] [namenode_last_checkpoint] (NameNode Last Checkpoint) Last Checkpoint: [0 hours, 0 minutes, 1 transactions] 2017-05-17 08:58:46,572 [OK] [HARD] [HDFS] [namenode_directory_status] (NameNode Directory Status) Directories are healthy 2017-05-17 08:58:47,557 [OK] [HARD] [HDFS] [datanode_process] (DataNode Process) TCP OK - 0.000s response on port 50010 2017-05-17 08:58:47,559 [OK] [HARD] [ZOOKEEPER] [zookeeper_server_process] (ZooKeeper Server Process) TCP OK - 0.000s response on port 2181 2017-05-17 08:58:47,561 [OK] [HARD] [HDFS] [datanode_webui] (DataNode Web UI) HTTP 200 response in 0.000s 2017-05-17 08:58:47,562 [OK] [HARD] [AMBARI_METRICS] [ams_metrics_monitor_process] (Metrics Monitor Status) Ambari Monitor is running on nshk-3.openstacklocal 2017-05-17 08:58:47,583 [WARNING] [HARD] [ZOOKEEPER] [zookeeper_server_process_percent] (Percent ZooKeeper Servers Available) affected: [2], total: [3] 2017-05-17 08:58:48,553 [OK] [HARD] [AMBARI_METRICS] [ams_metrics_monitor_process] (Metrics Monitor Status) Ambari Monitor is running on nshk-2.openstacklocal 2017-05-17 08:58:51,561 [OK] [HARD] [HDFS] [datanode_process] (DataNode Process) TCP OK - 0.000s response on port 50010 2017-05-17 08:58:51,563 [OK] [HARD] [ZOOKEEPER] [zookeeper_server_process] (ZooKeeper Server Process) TCP OK - 0.000s response on port 2181 2017-05-17 08:58:51,566 [OK] [HARD] [HDFS] [datanode_webui] (DataNode Web UI) HTTP 200 response in 0.000s 2017-05-17 08:58:51,573 [OK] [HARD] [HDFS] [datanode_webui] (DataNode Web UI) HTTP 200 response in 0.001s 2017-05-17 08:58:51,574 [OK] [HARD] [ZOOKEEPER] [zookeeper_server_process] (ZooKeeper Server Process) TCP OK - 0.001s response on port 2181 2017-05-17 08:58:51,575 [OK] [HARD] [AMBARI_METRICS] [ams_metrics_monitor_process] (Metrics Monitor Status) Ambari Monitor is running on nshk-4.openstacklocal 2017-05-17 08:58:51,577 [OK] [HARD] [HDFS] [datanode_process] (DataNode Process) TCP OK - 0.000s response on port 50010 2017-05-17 08:58:51,578 [OK] [HARD] [ZOOKEEPER] [zookeeper_server_process_percent] (Percent ZooKeeper Servers Available) affected: [1], total: [3] 2017-05-17 08:58:51,585 [WARNING] [HARD] [AMBARI_METRICS] [metrics_monitor_process_percent] (Percent Metrics Monitors Available) affected: [1], total: [4] 2017-05-17 08:58:51,587 [OK] [HARD] [HDFS] [datanode_process_percent] (Percent DataNodes Available) affected: [0], total: [3] 2017-05-17 08:58:53,553 [OK] [HARD] [AMBARI_METRICS] [ams_metrics_monitor_process] (Metrics Monitor Status) Ambari Monitor is running on nshk-1.openstacklocal 2017-05-17 08:58:53,559 [OK] [HARD] [AMBARI_METRICS] [metrics_monitor_process_percent] (Percent Metrics Monitors Available) affected: [0], total: [4] 2017-05-17 08:59:45,601 [OK] [HARD] [HDFS] [namenode_hdfs_pending_deletion_blocks] (HDFS Pending Deletion Blocks) Pending Deletion Blocks:[0] 2017-05-17 08:59:45,601 [OK] [HARD] [HDFS] [namenode_hdfs_blocks_health] (NameNode Blocks Health) Total Blocks:[5], Missing Blocks:[0] 2017-05-17 08:59:45,603 [OK] [HARD] [HDFS] [namenode_hdfs_capacity_utilization] (HDFS Capacity Utilization) Capacity Used:[0%, 612047256], Capacity Remaining:[196976988672] 2017-05-17 08:59:45,603 [OK] [HARD] [HDFS] [namenode_rpc_latency] (NameNode RPC Latency) Average Queue Time:[0.666666666667], Average Processing Time:[0.5] 2017-05-17 08:59:45,604 [OK] [HARD] [HDFS] [namenode_cpu] (NameNode Host CPU Utilization) 4 CPU, load 4.1% 2017-05-17 08:59:45,604 [OK] [HARD] [MAPREDUCE2] [mapreduce_history_server_process] (History Server Process) TCP OK - 0.000s response on port 19888 2017-05-17 08:59:46,564 [OK] [HARD] [YARN] [yarn_resourcemanager_webui] (ResourceManager Web UI) HTTP 200 response in 0.000s 2017-05-17 08:59:46,565 [OK] [HARD] [YARN] [yarn_resourcemanager_rpc_latency] (ResourceManager RPC Latency) Average Queue Time:[0.0], Average Processing Time:[0.0] 2017-05-17 08:59:46,566 [OK] [HARD] [MAPREDUCE2] [mapreduce_history_server_webui] (History Server Web UI) HTTP 200 response in 0.000s 2017-05-17 08:59:46,567 [OK] [HARD] [YARN] [nodemanager_health_summary] (NodeManager Health Summary) All NodeManagers are healthy 2017-05-17 08:59:46,568 [OK] [HARD] [MAPREDUCE2] [mapreduce_history_server_cpu] (History Server CPU Utilization) 4 CPU, load 4.1% 2017-05-17 08:59:46,569 [OK] [HARD] [YARN] [yarn_resourcemanager_cpu] (ResourceManager CPU Utilization) 4 CPU, load 4.1% 2017-05-17 08:59:46,570 [OK] [HARD] [YARN] [yarn_app_timeline_server_webui] (App Timeline Web UI) HTTP 200 response in 0.000s 2017-05-17 08:59:46,570 [OK] [HARD] [MAPREDUCE2] [mapreduce_history_server_rpc_latency] (History Server RPC Latency) Average Queue Time:[0.0], Average Processing Time:[0.0] 2017-05-17 08:59:47,560 [OK] [HARD] [HDFS] [datanode_unmounted_data_dir] (DataNode Unmounted Data Dir) The following data dir(s) are valid: /hadoop/hdfs/data 2017-05-17 08:59:47,564 [OK] [HARD] [HDFS] [secondary_namenode_process] (Secondary NameNode Process) HTTP 200 response in 0.000s 2017-05-17 08:59:47,566 [OK] [HARD] [HDFS] [datanode_storage] (DataNode Storage) Remaining Capacity:[65709172224], Total Capacity:[11% Used, 73861871104] 2017-05-17 08:59:47,568 [OK] [HARD] [HDFS] [datanode_heap_usage] (DataNode Heap Usage) Used Heap:[15%, 153.77765 MB], Max Heap: 1004.0 MB 2017-05-17 08:59:48,551 [OK] [HARD] [YARN] [yarn_nodemanager_webui] (NodeManager Web UI) HTTP 200 response in 0.000s 2017-05-17 08:59:48,552 [OK] [HARD] [YARN] [yarn_nodemanager_health] (NodeManager Health) NodeManager Healthy 2017-05-17 08:59:49,566 [OK] [HARD] [HDFS] [datanode_unmounted_data_dir] (DataNode Unmounted Data Dir) The following data dir(s) are valid: /hadoop/hdfs/data 2017-05-17 08:59:49,566 [OK] [HARD] [HDFS] [datanode_storage] (DataNode Storage) Remaining Capacity:[65494717952], Total Capacity:[11% Used, 73861871104] 2017-05-17 08:59:49,568 [OK] [HARD] [AMBARI_METRICS] [ams_metrics_collector_hbase_master_cpu] (Metrics Collector - HBase Master CPU Utilization) 4 CPU, load 2.4% 2017-05-17 08:59:49,569 [OK] [HARD] [HDFS] [datanode_heap_usage] (DataNode Heap Usage) Used Heap:[16%, 158.52066 MB], Max Heap: 1004.0 MB 2017-05-17 08:59:49,569 [OK] [HARD] [AMBARI_METRICS] [ams_metrics_collector_hbase_master_process] (Metrics Collector - HBase Master Process) TCP OK - 0.000s response on port 61310 2017-05-17 08:59:51,568 [OK] [HARD] [HDFS] [datanode_unmounted_data_dir] (DataNode Unmounted Data Dir) The following data dir(s) are valid: /hadoop/hdfs/data 2017-05-17 08:59:51,568 [OK] [HARD] [HDFS] [datanode_storage] (DataNode Storage) Remaining Capacity:[65772856832], Total Capacity:[11% Used, 73861871104] 2017-05-17 08:59:51,572 [OK] [HARD] [YARN] [yarn_nodemanager_webui] (NodeManager Web UI) HTTP 200 response in 0.000s 2017-05-17 08:59:51,574 [OK] [HARD] [YARN] [yarn_nodemanager_health] (NodeManager Health) NodeManager Healthy 2017-05-17 08:59:51,574 [OK] [HARD] [HDFS] [datanode_heap_usage] (DataNode Heap Usage) Used Heap:[4%, 43.15798 MB], Max Heap: 1004.0 MB 2017-05-17 08:59:51,580 [OK] [HARD] [HDFS] [datanode_storage_percent] (Percent DataNodes With Available Space) affected: [0], total: [3] 2017-05-17 09:00:50,597 [OK] [HARD] [AMBARI_METRICS] [ams_metrics_collector_process] (Metrics Collector Process) TCP OK - 0.000s response on port 6188 2017-05-17 09:00:50,600 [OK] [HARD] [YARN] [yarn_nodemanager_webui] (NodeManager Web UI) HTTP 200 response in 0.000s 2017-05-17 09:00:50,602 [OK] [HARD] [YARN] [yarn_nodemanager_health] (NodeManager Health) NodeManager Healthy 2017-05-17 09:00:50,627 [OK] [HARD] [AMBARI_METRICS] [grafana_webui] (Grafana Web UI) HTTP 200 response in 0.000s 2017-05-17 09:00:50,630 [OK] [HARD] [YARN] [yarn_nodemanager_webui_percent] (Percent NodeManagers Available) affected: [0], total: [3] 2017-05-17 09:01:51,551 [OK] [HARD] [HIVE] [hive_server_process] (HiveServer2 Process) TCP OK - 2.908s response on port 10000 2017-05-17 09:01:54,549 [OK] [HARD] [HIVE] [hive_metastore_process] (Hive Metastore Process) Metastore OK - Hive command took 7.475s 2017-05-17 09:02:46,553 [OK] [HARD] [HIVE] [hive_webhcat_server_status] (WebHCat Server Status) WebHCat status was OK (0.384s response from http://nshk-1.openstacklocal:50111/templeton/v1/status?user.name=ambari-qa) 2017-05-17 09:04:45,567 [OK] [HARD] [HDFS] [namenode_client_rpc_queue_latency_hourly] (NameNode Client RPC Queue Latency (Hourly)) There were no data points above the minimum threshold of 30 seconds 2017-05-17 09:04:45,567 [OK] [HARD] [HDFS] [namenode_client_rpc_processing_latency_hourly] (NameNode Client RPC Processing Latency (Hourly)) There were no data points above the minimum threshold of 30 seconds 2017-05-17 09:25:51,563 [CRITICAL] [HARD] [YARN] [yarn_nodemanager_webui] (NodeManager Web UI) Connection failed to http://nshk-3.openstacklocal:8042 () 2017-05-17 09:25:51,564 [CRITICAL] [HARD] [YARN] [yarn_nodemanager_health] (NodeManager Health) Connection failed to http://nshk-3.openstacklocal:8042/ws/v1/node/info (Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 171, in execute url_response = urllib2.urlopen(query, timeout=connection_timeout) File "/usr/lib64/python2.6/urllib2.py", line 126, in urlopen return _opener.open(url, data, timeout) File "/usr/lib64/python2.6/urllib2.py", line 391, in open response = self._open(req, data) File "/usr/lib64/python2.6/urllib2.py", line 409, in _open '_open', req) File "/usr/lib64/python2.6/urllib2.py", line 369, in _call_chain result = func(*args) File "/usr/lib64/python2.6/urllib2.py", line 1190, in http_open return self.do_open(httplib.HTTPConnection, req) File "/usr/lib64/python2.6/urllib2.py", line 1165, in do_open raise URLError(err) URLError: ) 2017-05-17 09:25:51,568 [CRITICAL] [HARD] [YARN] [yarn_nodemanager_webui_percent] (Percent NodeManagers Available) affected: [1], total: [3] 2017-05-17 09:26:52,590 [OK] [HARD] [YARN] [yarn_nodemanager_webui] (NodeManager Web UI) HTTP 200 response in 0.000s 2017-05-17 09:26:52,593 [OK] [HARD] [YARN] [yarn_nodemanager_health] (NodeManager Health) NodeManager Healthy 2017-05-17 09:26:52,597 [OK] [HARD] [YARN] [yarn_nodemanager_webui_percent] (Percent NodeManagers Available) affected: [0], total: [3] 2017-05-18 01:06:47,565 [OK] [HARD] [ZEPPELIN] [zeppelin_server_status] (Zeppelin Server Status) Successful connection to Zeppelin 2017-05-18 01:23:46,589 [CRITICAL] [HARD] [MAPREDUCE2] [mapreduce_history_server_webui] (History Server Web UI) Connection failed to http://nshk-1.openstacklocal:19888 () 2017-05-18 01:23:46,589 [CRITICAL] [HARD] [MAPREDUCE2] [mapreduce_history_server_process] (History Server Process) Connection failed: [Errno 111] Connection refused to nshk-1.openstacklocal:19888 2017-05-18 01:24:46,571 [OK] [HARD] [MAPREDUCE2] [mapreduce_history_server_webui] (History Server Web UI) HTTP 200 response in 0.000s 2017-05-18 01:24:46,571 [OK] [HARD] [MAPREDUCE2] [mapreduce_history_server_process] (History Server Process) TCP OK - 0.000s response on port 19888 2017-05-18 03:34:46,584 [CRITICAL] [HARD] [MAPREDUCE2] [mapreduce_history_server_process] (History Server Process) Connection failed: [Errno 111] Connection refused to nshk-1.openstacklocal:19888 2017-05-18 03:34:46,584 [UNKNOWN] [HARD] [MAPREDUCE2] [mapreduce_history_server_cpu] (History Server CPU Utilization) [Alert][mapreduce_history_server_cpu] Unable to extract JSON from JMX response 2017-05-18 03:34:46,587 [UNKNOWN] [HARD] [MAPREDUCE2] [mapreduce_history_server_rpc_latency] (History Server RPC Latency) [Alert][mapreduce_history_server_rpc_latency] Unable to extract JSON from JMX response 2017-05-18 03:34:46,587 [CRITICAL] [HARD] [YARN] [yarn_app_timeline_server_webui] (App Timeline Web UI) Connection failed to http://nshk-1.openstacklocal:8188/ws/v1/timeline () 2017-05-18 03:34:46,587 [CRITICAL] [HARD] [MAPREDUCE2] [mapreduce_history_server_webui] (History Server Web UI) Connection failed to http://nshk-1.openstacklocal:19888 () 2017-05-18 03:34:47,554 [CRITICAL] [HARD] [AMBARI_METRICS] [ams_metrics_monitor_process] (Metrics Monitor Status) Ambari Monitor is NOT running on nshk-3.openstacklocal 2017-05-18 03:34:47,564 [WARNING] [HARD] [AMBARI_METRICS] [metrics_monitor_process_percent] (Percent Metrics Monitors Available) affected: [1], total: [4] 2017-05-18 03:34:47,576 [CRITICAL] [HARD] [HDFS] [secondary_namenode_process] (Secondary NameNode Process) Connection failed to http://nshk-2.openstacklocal:50090 () 2017-05-18 03:34:47,576 [CRITICAL] [HARD] [YARN] [yarn_nodemanager_webui] (NodeManager Web UI) Connection failed to http://nshk-2.openstacklocal:8042 () 2017-05-18 03:34:47,579 [CRITICAL] [HARD] [YARN] [yarn_nodemanager_health] (NodeManager Health) Connection failed to http://nshk-2.openstacklocal:8042/ws/v1/node/info (Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 171, in execute url_response = urllib2.urlopen(query, timeout=connection_timeout) File "/usr/lib64/python2.6/urllib2.py", line 126, in urlopen return _opener.open(url, data, timeout) File "/usr/lib64/python2.6/urllib2.py", line 391, in open response = self._open(req, data) File "/usr/lib64/python2.6/urllib2.py", line 409, in _open '_open', req) File "/usr/lib64/python2.6/urllib2.py", line 369, in _call_chain result = func(*args) File "/usr/lib64/python2.6/urllib2.py", line 1190, in http_open return self.do_open(httplib.HTTPConnection, req) File "/usr/lib64/python2.6/urllib2.py", line 1165, in do_open raise URLError(err) URLError: ) 2017-05-18 03:34:47,587 [CRITICAL] [HARD] [YARN] [yarn_nodemanager_webui_percent] (Percent NodeManagers Available) affected: [1], total: [3] 2017-05-18 03:34:48,552 [CRITICAL] [HARD] [AMBARI_METRICS] [ams_metrics_monitor_process] (Metrics Monitor Status) Ambari Monitor is NOT running on nshk-2.openstacklocal 2017-05-18 03:34:48,559 [CRITICAL] [HARD] [AMBARI_METRICS] [metrics_monitor_process_percent] (Percent Metrics Monitors Available) affected: [2], total: [4] 2017-05-18 03:34:49,570 [CRITICAL] [HARD] [HIVE] [hive_server_process] (HiveServer2 Process) Connection failed on host nshk-1.openstacklocal:10000 (Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_hive_thrift_port.py", line 200, in execute check_command_timeout=int(check_command_timeout)) File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/hive_check.py", line 74, in check_thrift_port_sasl timeout=check_command_timeout) File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 155, in __init__ self.env.run() File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run self.run_action(resource, action) File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action provider_action() File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 273, in action_run tries=self.resource.tries, try_sleep=self.resource.try_sleep) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 70, in inner result = function(command, **kwargs) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 92, in checked_call tries=tries, try_sleep=try_sleep) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 140, in _call_wrapper result = _call(command, **kwargs_copy) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 293, in _call raise ExecutionFailed(err_msg, code, out, err) ExecutionFailed: Execution of '! beeline -u 'jdbc:hive2://nshk-1.openstacklocal:10000/;transportMode=binary' -e '' 2>&1| awk '{print}'|grep -i -e 'Connection refused' -e 'Invalid URL'' returned 1. Error: Could not open client transport with JDBC Uri: jdbc:hive2://nshk-1.openstacklocal:10000/;transportMode=binary: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0) Error: Could not open client transport with JDBC Uri: jdbc:hive2://nshk-1.openstacklocal:10000/;transportMode=binary: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0) ) 2017-05-18 03:34:49,579 [CRITICAL] [HARD] [YARN] [yarn_nodemanager_webui] (NodeManager Web UI) Connection failed to http://nshk-4.openstacklocal:8042 () 2017-05-18 03:34:49,581 [CRITICAL] [HARD] [YARN] [yarn_nodemanager_health] (NodeManager Health) Connection failed to http://nshk-4.openstacklocal:8042/ws/v1/node/info (Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 171, in execute url_response = urllib2.urlopen(query, timeout=connection_timeout) File "/usr/lib64/python2.6/urllib2.py", line 126, in urlopen return _opener.open(url, data, timeout) File "/usr/lib64/python2.6/urllib2.py", line 391, in open response = self._open(req, data) File "/usr/lib64/python2.6/urllib2.py", line 409, in _open '_open', req) File "/usr/lib64/python2.6/urllib2.py", line 369, in _call_chain result = func(*args) File "/usr/lib64/python2.6/urllib2.py", line 1190, in http_open return self.do_open(httplib.HTTPConnection, req) File "/usr/lib64/python2.6/urllib2.py", line 1165, in do_open raise URLError(err) URLError: ) 2017-05-18 03:34:50,564 [CRITICAL] [HARD] [AMBARI_METRICS] [ams_metrics_monitor_process] (Metrics Monitor Status) Ambari Monitor is NOT running on nshk-4.openstacklocal 2017-05-18 03:34:51,555 [CRITICAL] [HARD] [HIVE] [hive_metastore_process] (Hive Metastore Process) Metastore on nshk-1.openstacklocal failed (Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_hive_metastore.py", line 198, in execute timeout=int(check_command_timeout) ) File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 155, in __init__ self.env.run() File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run self.run_action(resource, action) File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action provider_action() File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 273, in action_run tries=self.resource.tries, try_sleep=self.resource.try_sleep) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 70, in inner result = function(command, **kwargs) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 92, in checked_call tries=tries, try_sleep=try_sleep) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 140, in _call_wrapper result = _call(command, **kwargs_copy) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 293, in _call raise ExecutionFailed(err_msg, code, out, err) ExecutionFailed: Execution of 'export HIVE_CONF_DIR='/usr/hdp/current/hive-metastore/conf/conf.server' ; hive --hiveconf hive.metastore.uris=thrift://nshk-1.openstacklocal:9083 --hiveconf hive.metastore.client.connect.retry.delay=1 --hiveconf hive.metastore.failure.retries=1 --hiveconf hive.metastore.connect.retries=1 --hiveconf hive.metastore.client.socket.timeout=14 --hiveconf hive.execution.engine=mr -e 'show databases;'' returned 1. Logging initialized using configuration in file:/etc/hive/2.5.3.0-37/0/conf.server/hive-log4j.properties Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:543) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:233) at org.apache.hadoop.util.RunJar.main(RunJar.java:148) Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1551) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:89) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:135) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:107) at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3252) at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3271) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:524) ... 8 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1549) ... 14 more Caused by: MetaException(message:Could not connect to meta store using any of the URIs provided. Most recent failure: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused (Connection refused) at org.apache.thrift.transport.TSocket.open(TSocket.java:226) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:446) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:244) at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:74) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1549) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:89) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:135) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:107) at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3252) at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3271) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:524) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:233) at org.apache.hadoop.util.RunJar.main(RunJar.java:148) Caused by: java.net.ConnectException: Connection refused (Connection refused) at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at org.apache.thrift.transport.TSocket.open(TSocket.java:221) ... 22 more ) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:492) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:244) at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:74) ... 19 more ) 2017-05-18 03:34:51,559 [CRITICAL] [HARD] [AMBARI_METRICS] [grafana_webui] (Grafana Web UI) Connection failed to http://nshk-3.openstacklocal:3000 () 2017-05-18 03:34:51,559 [CRITICAL] [HARD] [YARN] [yarn_nodemanager_webui] (NodeManager Web UI) Connection failed to http://nshk-3.openstacklocal:8042 () 2017-05-18 03:34:51,562 [CRITICAL] [HARD] [YARN] [yarn_nodemanager_health] (NodeManager Health) Connection failed to http://nshk-3.openstacklocal:8042/ws/v1/node/info (Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 171, in execute url_response = urllib2.urlopen(query, timeout=connection_timeout) File "/usr/lib64/python2.6/urllib2.py", line 126, in urlopen return _opener.open(url, data, timeout) File "/usr/lib64/python2.6/urllib2.py", line 391, in open response = self._open(req, data) File "/usr/lib64/python2.6/urllib2.py", line 409, in _open '_open', req) File "/usr/lib64/python2.6/urllib2.py", line 369, in _call_chain result = func(*args) File "/usr/lib64/python2.6/urllib2.py", line 1190, in http_open return self.do_open(httplib.HTTPConnection, req) File "/usr/lib64/python2.6/urllib2.py", line 1165, in do_open raise URLError(err) URLError: ) 2017-05-18 03:34:53,550 [CRITICAL] [HARD] [AMBARI_METRICS] [ams_metrics_monitor_process] (Metrics Monitor Status) Ambari Monitor is NOT running on nshk-1.openstacklocal 2017-05-18 03:35:46,603 [UNKNOWN] [HARD] [HDFS] [namenode_last_checkpoint] (NameNode Last Checkpoint) Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/alerts/alert_checkpoint_time.py", line 203, in execute "LastCheckpointTime", connection_timeout)) File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/alerts/alert_checkpoint_time.py", line 246, in get_value_from_jmx response = urllib2.urlopen(query, timeout=connection_timeout) File "/usr/lib64/python2.6/urllib2.py", line 126, in urlopen return _opener.open(url, data, timeout) File "/usr/lib64/python2.6/urllib2.py", line 391, in open response = self._open(req, data) File "/usr/lib64/python2.6/urllib2.py", line 409, in _open '_open', req) File "/usr/lib64/python2.6/urllib2.py", line 369, in _call_chain result = func(*args) File "/usr/lib64/python2.6/urllib2.py", line 1190, in http_open return self.do_open(httplib.HTTPConnection, req) File "/usr/lib64/python2.6/urllib2.py", line 1165, in do_open raise URLError(err) URLError: 2017-05-18 03:35:46,603 [CRITICAL] [HARD] [HDFS] [namenode_webui] (NameNode Web UI) Connection failed to http://nshk-1.openstacklocal:50070 () 2017-05-18 03:35:46,605 [UNKNOWN] [HARD] [HDFS] [datanode_health_summary] (DataNode Health Summary) [Alert][datanode_health_summary] Unable to extract JSON from JMX response 2017-05-18 03:35:46,606 [UNKNOWN] [HARD] [HDFS] [namenode_hdfs_pending_deletion_blocks] (HDFS Pending Deletion Blocks) [Alert][namenode_hdfs_pending_deletion_blocks] Unable to extract JSON from JMX response 2017-05-18 03:35:46,606 [UNKNOWN] [HARD] [HDFS] [upgrade_finalized_state] (HDFS Upgrade Finalized State) Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/alerts/alert_upgrade_finalized.py", line 140, in execute "UpgradeFinalized")) File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/alerts/alert_upgrade_finalized.py", line 169, in get_value_from_jmx response = urllib2.urlopen(query, timeout=int(CONNECTION_TIMEOUT_DEFAULT)) File "/usr/lib64/python2.6/urllib2.py", line 126, in urlopen return _opener.open(url, data, timeout) File "/usr/lib64/python2.6/urllib2.py", line 391, in open response = self._open(req, data) File "/usr/lib64/python2.6/urllib2.py", line 409, in _open '_open', req) File "/usr/lib64/python2.6/urllib2.py", line 369, in _call_chain result = func(*args) File "/usr/lib64/python2.6/urllib2.py", line 1190, in http_open return self.do_open(httplib.HTTPConnection, req) File "/usr/lib64/python2.6/urllib2.py", line 1165, in do_open raise URLError(err) URLError: 2017-05-18 03:35:46,608 [CRITICAL] [HARD] [YARN] [yarn_resourcemanager_webui] (ResourceManager Web UI) Connection failed to http://nshk-1.openstacklocal:8088 () 2017-05-18 03:35:46,608 [UNKNOWN] [HARD] [HDFS] [namenode_hdfs_blocks_health] (NameNode Blocks Health) [Alert][namenode_hdfs_blocks_health] Unable to extract JSON from JMX response 2017-05-18 03:35:46,612 [CRITICAL] [HARD] [HIVE] [hive_webhcat_server_status] (WebHCat Server Status) Connection failed to http://nshk-1.openstacklocal:50111/templeton/v1/status?user.name=ambari-qa + Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_webhcat_server.py", line 190, in execute url_response = urllib2.urlopen(query_url, timeout=connection_timeout) File "/usr/lib64/python2.6/urllib2.py", line 126, in urlopen return _opener.open(url, data, timeout) File "/usr/lib64/python2.6/urllib2.py", line 391, in open response = self._open(req, data) File "/usr/lib64/python2.6/urllib2.py", line 409, in _open '_open', req) File "/usr/lib64/python2.6/urllib2.py", line 369, in _call_chain result = func(*args) File "/usr/lib64/python2.6/urllib2.py", line 1190, in http_open return self.do_open(httplib.HTTPConnection, req) File "/usr/lib64/python2.6/urllib2.py", line 1165, in do_open raise URLError(err) URLError: 2017-05-18 03:35:46,613 [UNKNOWN] [HARD] [HDFS] [namenode_hdfs_capacity_utilization] (HDFS Capacity Utilization) [Alert][namenode_hdfs_capacity_utilization] Unable to extract JSON from JMX response 2017-05-18 03:35:46,614 [UNKNOWN] [HARD] [YARN] [nodemanager_health_summary] (NodeManager Health Summary) Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanagers_summary.py", line 155, in execute "LiveNodeManagers", connection_timeout)) File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanagers_summary.py", line 195, in get_value_from_jmx response = url_opener.open(query, timeout=connection_timeout) File "/usr/lib64/python2.6/urllib2.py", line 391, in open response = self._open(req, data) File "/usr/lib64/python2.6/urllib2.py", line 409, in _open '_open', req) File "/usr/lib64/python2.6/urllib2.py", line 369, in _call_chain result = func(*args) File "/usr/lib64/python2.6/urllib2.py", line 1190, in http_open return self.do_open(httplib.HTTPConnection, req) File "/usr/lib64/python2.6/urllib2.py", line 1165, in do_open raise URLError(err) URLError: 2017-05-18 03:35:46,614 [UNKNOWN] [HARD] [HDFS] [namenode_rpc_latency] (NameNode RPC Latency) [Alert][namenode_rpc_latency] Unable to extract JSON from JMX response 2017-05-18 03:35:46,615 [UNKNOWN] [HARD] [HDFS] [namenode_directory_status] (NameNode Directory Status) [Alert][namenode_directory_status] Unable to extract JSON from JMX response 2017-05-18 03:35:47,567 [CRITICAL] [HARD] [SPARK] [SPARK_JOBHISTORYSERVER_PROCESS] (Spark History Server) Connection failed: [Errno 111] Connection refused to nshk-1.openstacklocal:18080 2017-05-18 03:35:47,568 [CRITICAL] [HARD] [ZEPPELIN] [zeppelin_server_status] (Zeppelin Server Status) Zeppelin is not running 2017-05-18 03:35:47,569 [CRITICAL] [HARD] [HDFS] [datanode_process] (DataNode Process) Connection failed: [Errno 111] Connection refused to nshk-2.openstacklocal:50010 2017-05-18 03:35:47,570 [UNKNOWN] [HARD] [HDFS] [datanode_storage] (DataNode Storage) [Alert][datanode_storage] Unable to extract JSON from JMX response 2017-05-18 03:35:47,572 [CRITICAL] [HARD] [ZOOKEEPER] [zookeeper_server_process] (ZooKeeper Server Process) Connection failed: [Errno 111] Connection refused to nshk-2.openstacklocal:2181 2017-05-18 03:35:47,573 [CRITICAL] [HARD] [HDFS] [datanode_webui] (DataNode Web UI) Connection failed to http://nshk-2.openstacklocal:50075 () 2017-05-18 03:35:47,574 [UNKNOWN] [HARD] [HDFS] [datanode_heap_usage] (DataNode Heap Usage) [Alert][datanode_heap_usage] Unable to extract JSON from JMX response 2017-05-18 03:35:47,580 [CRITICAL] [HARD] [HDFS] [datanode_process_percent] (Percent DataNodes Available) affected: [1], total: [3] 2017-05-18 03:35:47,582 [UNKNOWN] [HARD] [HDFS] [datanode_storage_percent] (Percent DataNodes With Available Space) There are alerts with a state of UNKNOWN. 2017-05-18 03:35:49,586 [CRITICAL] [HARD] [AMBARI_METRICS] [ams_metrics_collector_process] (Metrics Collector Process) Connection failed: [Errno 111] Connection refused to nshk-4.openstacklocal:6188 2017-05-18 03:35:49,586 [UNKNOWN] [HARD] [HDFS] [datanode_heap_usage] (DataNode Heap Usage) [Alert][datanode_heap_usage] Unable to extract JSON from JMX response 2017-05-18 03:35:49,588 [UNKNOWN] [HARD] [HDFS] [datanode_storage] (DataNode Storage) [Alert][datanode_storage] Unable to extract JSON from JMX response 2017-05-18 03:35:49,589 [CRITICAL] [HARD] [ZOOKEEPER] [zookeeper_server_process] (ZooKeeper Server Process) Connection failed: [Errno 111] Connection refused to nshk-4.openstacklocal:2181 2017-05-18 03:35:49,591 [CRITICAL] [HARD] [HDFS] [datanode_webui] (DataNode Web UI) Connection failed to http://nshk-4.openstacklocal:50075 () 2017-05-18 03:35:49,592 [CRITICAL] [HARD] [HDFS] [datanode_process] (DataNode Process) Connection failed: [Errno 111] Connection refused to nshk-4.openstacklocal:50010 2017-05-18 03:35:49,594 [CRITICAL] [HARD] [AMBARI_METRICS] [ams_metrics_collector_hbase_master_process] (Metrics Collector - HBase Master Process) Connection failed: [Errno 111] Connection refused to nshk-4.openstacklocal:61310 2017-05-18 03:35:49,599 [WARNING] [HARD] [ZOOKEEPER] [zookeeper_server_process_percent] (Percent ZooKeeper Servers Available) affected: [2], total: [3] 2017-05-18 03:35:51,559 [UNKNOWN] [HARD] [HDFS] [datanode_heap_usage] (DataNode Heap Usage) [Alert][datanode_heap_usage] Unable to extract JSON from JMX response 2017-05-18 03:35:51,559 [UNKNOWN] [HARD] [HDFS] [datanode_storage] (DataNode Storage) [Alert][datanode_storage] Unable to extract JSON from JMX response 2017-05-18 03:35:51,561 [CRITICAL] [HARD] [ZOOKEEPER] [zookeeper_server_process] (ZooKeeper Server Process) Connection failed: [Errno 111] Connection refused to nshk-3.openstacklocal:2181 2017-05-18 03:35:51,562 [CRITICAL] [HARD] [HDFS] [datanode_webui] (DataNode Web UI) Connection failed to http://nshk-3.openstacklocal:50075 () 2017-05-18 03:35:51,564 [CRITICAL] [HARD] [HDFS] [datanode_process] (DataNode Process) Connection failed: [Errno 111] Connection refused to nshk-3.openstacklocal:50010 2017-05-18 03:35:51,568 [CRITICAL] [HARD] [ZOOKEEPER] [zookeeper_server_process_percent] (Percent ZooKeeper Servers Available) affected: [3], total: [3] 2017-05-18 03:46:08,393 [OK] [HARD] [AMBARI_METRICS] [ams_metrics_monitor_process] (Metrics Monitor Status) Ambari Monitor is running on nshk-2.openstacklocal 2017-05-18 03:46:10,281 [OK] [HARD] [AMBARI_METRICS] [ams_metrics_monitor_process] (Metrics Monitor Status) Ambari Monitor is running on nshk-3.openstacklocal 2017-05-18 03:46:10,350 [OK] [HARD] [AMBARI_METRICS] [ams_metrics_monitor_process] (Metrics Monitor Status) Ambari Monitor is running on nshk-1.openstacklocal 2017-05-18 03:46:10,361 [WARNING] [HARD] [AMBARI_METRICS] [metrics_monitor_process_percent] (Percent Metrics Monitors Available) affected: [1], total: [4] 2017-05-18 03:46:12,247 [OK] [HARD] [AMBARI_METRICS] [ams_metrics_monitor_process] (Metrics Monitor Status) Ambari Monitor is running on nshk-4.openstacklocal 2017-05-18 03:46:12,253 [OK] [HARD] [AMBARI_METRICS] [metrics_monitor_process_percent] (Percent Metrics Monitors Available) affected: [0], total: [4] 2017-05-18 03:46:13,244 [OK] [HARD] [ZOOKEEPER] [zookeeper_server_process] (ZooKeeper Server Process) TCP OK - 0.000s response on port 2181 2017-05-18 03:46:13,253 [WARNING] [HARD] [ZOOKEEPER] [zookeeper_server_process_percent] (Percent ZooKeeper Servers Available) affected: [2], total: [3] 2017-05-18 03:47:08,299 [OK] [HARD] [HDFS] [datanode_webui] (DataNode Web UI) HTTP 200 response in 0.000s 2017-05-18 03:47:08,299 [OK] [HARD] [HDFS] [datanode_process] (DataNode Process) TCP OK - 0.000s response on port 50010 2017-05-18 03:47:08,304 [OK] [HARD] [HDFS] [datanode_storage] (DataNode Storage) Remaining Capacity:[64550987264], Total Capacity:[13% Used, 73861871104] 2017-05-18 03:47:08,308 [OK] [HARD] [ZOOKEEPER] [zookeeper_server_process] (ZooKeeper Server Process) TCP OK - 0.000s response on port 2181 2017-05-18 03:47:08,312 [OK] [HARD] [HDFS] [datanode_heap_usage] (DataNode Heap Usage) Used Heap:[14%, 138.74535 MB], Max Heap: 1004.0 MB 2017-05-18 03:47:08,323 [OK] [HARD] [ZOOKEEPER] [zookeeper_server_process_percent] (Percent ZooKeeper Servers Available) affected: [1], total: [3] 2017-05-18 03:47:10,264 [OK] [HARD] [HDFS] [datanode_process] (DataNode Process) TCP OK - 0.000s response on port 50010 2017-05-18 03:47:11,246 [OK] [HARD] [HDFS] [datanode_heap_usage] (DataNode Heap Usage) Used Heap:[14%, 142.82831 MB], Max Heap: 1004.0 MB 2017-05-18 03:47:11,249 [OK] [HARD] [HDFS] [datanode_storage] (DataNode Storage) Remaining Capacity:[64538404352], Total Capacity:[13% Used, 73861871104] 2017-05-18 03:47:11,252 [OK] [HARD] [ZOOKEEPER] [zookeeper_server_process] (ZooKeeper Server Process) TCP OK - 0.000s response on port 2181 2017-05-18 03:47:11,261 [OK] [HARD] [HDFS] [datanode_webui] (DataNode Web UI) HTTP 200 response in 0.000s 2017-05-18 03:47:11,286 [OK] [HARD] [HDFS] [namenode_webui] (NameNode Web UI) HTTP 200 response in 0.000s 2017-05-18 03:47:11,286 [OK] [HARD] [HDFS] [datanode_health_summary] (DataNode Health Summary) All 3 DataNode(s) are healthy 2017-05-18 03:47:11,289 [OK] [HARD] [HDFS] [upgrade_finalized_state] (HDFS Upgrade Finalized State) HDFS cluster is not in the upgrade state 2017-05-18 03:47:11,289 [OK] [HARD] [HDFS] [namenode_hdfs_pending_deletion_blocks] (HDFS Pending Deletion Blocks) Pending Deletion Blocks:[0] 2017-05-18 03:47:11,290 [OK] [HARD] [HDFS] [namenode_last_checkpoint] (NameNode Last Checkpoint) Last Checkpoint: [2 hours, 23 minutes, 2609 transactions] 2017-05-18 03:47:11,291 [OK] [HARD] [HDFS] [namenode_hdfs_blocks_health] (NameNode Blocks Health) Total Blocks:[65], Missing Blocks:[0] 2017-05-18 03:47:11,292 [OK] [HARD] [HDFS] [namenode_hdfs_capacity_utilization] (HDFS Capacity Utilization) Capacity Used:[1%, 1832345600], Capacity Remaining:[192214778368] 2017-05-18 03:47:11,292 [OK] [HARD] [HDFS] [namenode_rpc_latency] (NameNode RPC Latency) Average Queue Time:[0.6], Average Processing Time:[0.3] 2017-05-18 03:47:11,293 [OK] [HARD] [HDFS] [namenode_directory_status] (NameNode Directory Status) Directories are healthy 2017-05-18 03:47:12,272 [OK] [HARD] [HDFS] [datanode_webui] (DataNode Web UI) HTTP 200 response in 0.000s 2017-05-18 03:47:12,272 [OK] [HARD] [HDFS] [datanode_storage] (DataNode Storage) Remaining Capacity:[63125370368], Total Capacity:[15% Used, 73861871104] 2017-05-18 03:47:12,275 [OK] [HARD] [HDFS] [datanode_heap_usage] (DataNode Heap Usage) Used Heap:[14%, 142.82617 MB], Max Heap: 1004.0 MB 2017-05-18 03:47:12,276 [OK] [HARD] [HDFS] [datanode_process] (DataNode Process) TCP OK - 0.000s response on port 50010 2017-05-18 03:47:12,282 [OK] [HARD] [HDFS] [datanode_storage_percent] (Percent DataNodes With Available Space) affected: [0], total: [3] 2017-05-18 03:47:12,283 [OK] [HARD] [HDFS] [datanode_process_percent] (Percent DataNodes Available) affected: [0], total: [3] 2017-05-18 03:48:08,287 [OK] [HARD] [HDFS] [secondary_namenode_process] (Secondary NameNode Process) HTTP 200 response in 0.000s 2017-05-18 03:48:08,287 [OK] [HARD] [YARN] [yarn_nodemanager_health] (NodeManager Health) NodeManager Healthy 2017-05-18 03:48:09,240 [OK] [HARD] [YARN] [yarn_nodemanager_webui] (NodeManager Web UI) HTTP 200 response in 0.000s 2017-05-18 03:48:10,281 [OK] [HARD] [SPARK] [SPARK_JOBHISTORYSERVER_PROCESS] (Spark History Server) TCP OK - 0.000s response on port 18080 2017-05-18 03:48:10,281 [OK] [HARD] [YARN] [yarn_resourcemanager_webui] (ResourceManager Web UI) HTTP 200 response in 0.000s 2017-05-18 03:48:10,282 [OK] [HARD] [YARN] [nodemanager_health_summary] (NodeManager Health Summary) All NodeManagers are healthy 2017-05-18 03:48:10,284 [OK] [HARD] [YARN] [yarn_app_timeline_server_webui] (App Timeline Web UI) HTTP 200 response in 0.000s 2017-05-18 03:48:10,284 [OK] [HARD] [ZEPPELIN] [zeppelin_server_status] (Zeppelin Server Status) Successful connection to Zeppelin 2017-05-18 03:48:10,286 [OK] [HARD] [MAPREDUCE2] [mapreduce_history_server_webui] (History Server Web UI) HTTP 200 response in 0.000s 2017-05-18 03:48:10,287 [OK] [HARD] [MAPREDUCE2] [mapreduce_history_server_process] (History Server Process) TCP OK - 0.000s response on port 19888 2017-05-18 03:48:11,251 [OK] [HARD] [YARN] [yarn_nodemanager_webui] (NodeManager Web UI) HTTP 200 response in 0.000s 2017-05-18 03:48:11,257 [OK] [HARD] [YARN] [yarn_nodemanager_health] (NodeManager Health) NodeManager Healthy 2017-05-18 03:48:13,257 [OK] [HARD] [AMBARI_METRICS] [ams_metrics_collector_hbase_master_process] (Metrics Collector - HBase Master Process) TCP OK - 0.000s response on port 61310 2017-05-18 03:48:18,236 [OK] [HARD] [HIVE] [hive_metastore_process] (Hive Metastore Process) Metastore OK - Hive command took 7.761s 2017-05-18 03:50:10,322 [OK] [HARD] [MAPREDUCE2] [mapreduce_history_server_rpc_latency] (History Server RPC Latency) Average Queue Time:[0.0], Average Processing Time:[0.0] 2017-05-18 03:50:10,322 [OK] [HARD] [MAPREDUCE2] [mapreduce_history_server_cpu] (History Server CPU Utilization) 4 CPU, load 12.2% 2017-05-18 03:50:13,262 [OK] [HARD] [AMBARI_METRICS] [ams_metrics_collector_process] (Metrics Collector Process) TCP OK - 0.000s response on port 6188 2017-05-18 03:51:10,276 [OK] [HARD] [AMBARI_METRICS] [grafana_webui] (Grafana Web UI) HTTP 200 response in 0.000s 2017-05-18 03:51:10,277 [OK] [HARD] [HIVE] [hive_webhcat_server_status] (WebHCat Server Status) WebHCat status was OK (0.463s response from http://nshk-1.openstacklocal:50111/templeton/v1/status?user.name=ambari-qa) 2017-05-18 03:51:13,433 [OK] [HARD] [HIVE] [hive_server_process] (HiveServer2 Process) TCP OK - 2.578s response on port 10000 2017-05-18 03:51:13,435 [OK] [HARD] [YARN] [yarn_nodemanager_webui] (NodeManager Web UI) HTTP 200 response in 0.000s 2017-05-18 03:51:13,437 [OK] [HARD] [YARN] [yarn_nodemanager_health] (NodeManager Health) NodeManager Healthy 2017-05-18 03:51:13,450 [OK] [HARD] [YARN] [yarn_nodemanager_webui_percent] (Percent NodeManagers Available) affected: [0], total: [3] 2017-05-18 06:43:10,274 [CRITICAL] [HARD] [AMBARI_METRICS] [ams_metrics_monitor_process] (Metrics Monitor Status) Ambari Monitor is NOT running on nshk-2.openstacklocal 2017-05-18 06:43:10,282 [CRITICAL] [HARD] [AMBARI_METRICS] [grafana_webui] (Grafana Web UI) Connection failed to http://nshk-3.openstacklocal:3000 () 2017-05-18 06:43:10,284 [CRITICAL] [HARD] [AMBARI_METRICS] [ams_metrics_monitor_process] (Metrics Monitor Status) Ambari Monitor is NOT running on nshk-3.openstacklocal 2017-05-18 06:43:10,289 [WARNING] [HARD] [AMBARI_METRICS] [metrics_monitor_process_percent] (Percent Metrics Monitors Available) affected: [1], total: [4] 2017-05-18 06:43:10,293 [CRITICAL] [HARD] [AMBARI_METRICS] [metrics_monitor_process_percent] (Percent Metrics Monitors Available) affected: [2], total: [4] 2017-05-18 06:43:11,300 [CRITICAL] [HARD] [AMBARI_METRICS] [ams_metrics_monitor_process] (Metrics Monitor Status) Ambari Monitor is NOT running on nshk-1.openstacklocal 2017-05-18 06:43:12,252 [CRITICAL] [HARD] [AMBARI_METRICS] [ams_metrics_monitor_process] (Metrics Monitor Status) Ambari Monitor is NOT running on nshk-4.openstacklocal 2017-05-18 06:43:12,252 [CRITICAL] [HARD] [AMBARI_METRICS] [ams_metrics_collector_process] (Metrics Collector Process) Connection failed: [Errno 111] Connection refused to nshk-4.openstacklocal:6188 2017-05-18 06:45:13,319 [UNKNOWN] [HARD] [HDFS] [namenode_client_rpc_processing_latency_hourly] (NameNode Client RPC Processing Latency (Hourly)) Unable to retrieve metrics from the Ambari Metrics service. 2017-05-18 06:45:13,319 [UNKNOWN] [HARD] [HDFS] [namenode_client_rpc_queue_latency_hourly] (NameNode Client RPC Queue Latency (Hourly)) Unable to retrieve metrics from the Ambari Metrics service. 2017-05-18 06:45:17,556 [UNKNOWN] [HARD] [AMBARI_METRICS] [ams_metrics_collector_hbase_master_cpu] (Metrics Collector - HBase Master CPU Utilization) [Alert][ams_metrics_collector_hbase_master_cpu] Unable to extract JSON from JMX response 2017-05-18 07:13:13,796 [CRITICAL] [HARD] [AMBARI_METRICS] [ams_metrics_collector_hbase_master_process] (Metrics Collector - HBase Master Process) Connection failed: [Errno 111] Connection refused to nshk-4.openstacklocal:61310 2017-05-18 07:36:49,416 [CRITICAL] [HARD] [AMBARI] [ambari_server_agent_heartbeat] (Ambari Agent Heartbeat) nshk-2.openstacklocal is not sending heartbeats 2017-05-18 07:36:49,429 [CRITICAL] [HARD] [AMBARI] [ambari_server_agent_heartbeat] (Ambari Agent Heartbeat) nshk-3.openstacklocal is not sending heartbeats 2017-05-18 07:36:49,435 [CRITICAL] [HARD] [AMBARI] [ambari_server_agent_heartbeat] (Ambari Agent Heartbeat) nshk-4.openstacklocal is not sending heartbeats 2017-05-18 07:36:49,437 [CRITICAL] [HARD] [AMBARI] [ambari_server_agent_heartbeat] (Ambari Agent Heartbeat) nshk-1.openstacklocal is not sending heartbeats 2017-05-18 07:39:49,426 [CRITICAL] [HARD] [AMBARI] [ambari_server_stale_alerts] (Ambari Server Alerts) There are 54 stale alerts from 4 host(s): nshk-1.openstacklocal [App Timeline Web UI (13m), DataNode Health Summary (13m), HDFS Capacity Utilization (13m), HDFS Pending Deletion Blocks (13m), HDFS Upgrade Finalized State (13m), History Server Process (13m), History Server Web UI (13m), Host Disk Usage (13m), Metrics Monitor Status (13m), NameNode Blocks Health (13m), NameNode Directory Status (13m), NameNode Last Checkpoint (13m), NameNode RPC Latency (13m), NameNode Web UI (13m), NodeManager Health Summary (13m), ResourceManager Web UI (13m), Spark History Server (13m), WebHCat Server Status (13m), Zeppelin Server Status (13m)], nshk-2.openstacklocal [DataNode Heap Usage (13m), DataNode Process (13m), DataNode Storage (13m), DataNode Unmounted Data Dir (13m), DataNode Web UI (13m), Host Disk Usage (13m), Metrics Monitor Status (13m), NodeManager Health (13m), NodeManager Web UI (13m), Secondary NameNode Process (13m), ZooKeeper Server Process (13m)], nshk-3.openstacklocal [DataNode Heap Usage (13m), DataNode Process (13m), DataNode Storage (13m), DataNode Unmounted Data Dir (13m), DataNode Web UI (13m), Grafana Web UI (13m), Host Disk Usage (13m), Metrics Monitor Status (13m), NodeManager Health (13m), NodeManager Web UI (13m), ZooKeeper Server Process (13m)], nshk-4.openstacklocal [DataNode Heap Usage (13m), DataNode Process (13m), DataNode Storage (13m), DataNode Unmounted Data Dir (13m), DataNode Web UI (13m), Host Disk Usage (13m), Metrics Collector - Auto-Restart Status (13m), Metrics Collector - HBase Master Process (13m), Metrics Collector Process (13m), Metrics Monitor Status (13m), NodeManager Health (13m), NodeManager Web UI (13m), ZooKeeper Server Process (13m)] 2017-05-19 02:10:08,253 [OK] [HARD] [AMBARI] [ambari_server_agent_heartbeat] (Ambari Agent Heartbeat) nshk-2.openstacklocal is healthy 2017-05-19 02:10:08,286 [OK] [HARD] [AMBARI] [ambari_server_agent_heartbeat] (Ambari Agent Heartbeat) nshk-3.openstacklocal is healthy 2017-05-19 02:10:08,304 [OK] [HARD] [AMBARI] [ambari_server_agent_heartbeat] (Ambari Agent Heartbeat) nshk-4.openstacklocal is healthy 2017-05-19 02:10:08,307 [OK] [HARD] [AMBARI] [ambari_server_agent_heartbeat] (Ambari Agent Heartbeat) nshk-1.openstacklocal is healthy 2017-05-19 02:13:04,771 [OK] [HARD] [AMBARI_METRICS] [ams_metrics_monitor_process] (Metrics Monitor Status) Ambari Monitor is running on nshk-1.openstacklocal 2017-05-19 02:13:15,741 [OK] [HARD] [AMBARI_METRICS] [ams_metrics_monitor_process] (Metrics Monitor Status) Ambari Monitor is running on nshk-2.openstacklocal 2017-05-19 02:13:28,750 [OK] [HARD] [AMBARI_METRICS] [ams_metrics_monitor_process] (Metrics Monitor Status) Ambari Monitor is running on nshk-3.openstacklocal 2017-05-19 02:13:28,761 [WARNING] [HARD] [AMBARI_METRICS] [metrics_monitor_process_percent] (Percent Metrics Monitors Available) affected: [1], total: [4] 2017-05-19 02:13:38,760 [OK] [HARD] [AMBARI_METRICS] [ams_metrics_collector_hbase_master_process] (Metrics Collector - HBase Master Process) TCP OK - 0.000s response on port 61310 2017-05-19 02:14:39,747 [OK] [HARD] [AMBARI_METRICS] [ams_metrics_collector_hbase_master_cpu] (Metrics Collector - HBase Master CPU Utilization) 4 CPU, load 2.2% 2017-05-19 02:15:38,731 [OK] [HARD] [AMBARI_METRICS] [ams_metrics_monitor_process] (Metrics Monitor Status) Ambari Monitor is running on nshk-4.openstacklocal 2017-05-19 02:15:38,741 [OK] [HARD] [AMBARI_METRICS] [metrics_monitor_process_percent] (Percent Metrics Monitors Available) affected: [0], total: [4] 2017-05-19 02:16:08,167 [OK] [HARD] [AMBARI] [ambari_server_stale_alerts] (Ambari Server Alerts) All alerts have run within their time intervals. 2017-05-19 02:16:28,747 [OK] [HARD] [AMBARI_METRICS] [grafana_webui] (Grafana Web UI) HTTP 200 response in 0.000s 2017-05-19 02:16:39,738 [OK] [HARD] [AMBARI_METRICS] [ams_metrics_collector_process] (Metrics Collector Process) TCP OK - 0.000s response on port 6188 2017-05-19 02:29:38,780 [UNKNOWN] [HARD] [HDFS] [datanode_heap_usage] (DataNode Heap Usage) [Alert][datanode_heap_usage] Unable to extract JSON from JMX response 2017-05-19 02:29:38,781 [UNKNOWN] [HARD] [HDFS] [datanode_storage] (DataNode Storage) [Alert][datanode_storage] Unable to extract JSON from JMX response 2017-05-19 02:29:38,784 [CRITICAL] [HARD] [HDFS] [datanode_webui] (DataNode Web UI) Connection failed to http://nshk-4.openstacklocal:50075 () 2017-05-19 02:29:38,785 [CRITICAL] [HARD] [HDFS] [datanode_process] (DataNode Process) Connection failed: [Errno 111] Connection refused to nshk-4.openstacklocal:50010 2017-05-19 02:29:38,796 [UNKNOWN] [HARD] [HDFS] [datanode_storage_percent] (Percent DataNodes With Available Space) There are alerts with a state of UNKNOWN. 2017-05-19 02:29:38,798 [CRITICAL] [HARD] [HDFS] [datanode_process_percent] (Percent DataNodes Available) affected: [1], total: [3] 2017-05-19 02:30:09,716 [CRITICAL] [HARD] [HIVE] [hive_metastore_process] (Hive Metastore Process) Metastore on nshk-1.openstacklocal failed (Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_hive_metastore.py", line 200, in execute timeout_kill_strategy=TerminateStrategy.KILL_PROCESS_TREE, File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 155, in __init__ self.env.run() File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run self.run_action(resource, action) File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action provider_action() File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 262, in action_run tries=self.resource.tries, try_sleep=self.resource.try_sleep) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner result = function(command, **kwargs) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper result = _call(command, **kwargs_copy) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call raise ExecutionFailed(err_msg, code, out, err) ExecutionFailed: Execution of 'export HIVE_CONF_DIR='/usr/hdp/current/hive-metastore/conf' ; hive --hiveconf hive.metastore.uris=thrift://nshk-1.openstacklocal:9083 --hiveconf hive.metastore.client.connect.retry.delay=1 --hiveconf hive.metastore.failure.retries=1 --hiveconf hive.metastore.connect.retries=1 --hiveconf hive.metastore.client.socket.timeout=14 --hiveconf hive.execution.engine=mr -e 'show databases;'' returned 1. Logging initialized using configuration in file:/etc/hive/2.5.3.0-37/0/hive-log4j.properties Exception in thread "main" java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException): Cannot create directory /tmp/hive/ambari-qa/1be41c49-9952-4114-81f3-3e78f977a49e. Name node is in safe mode. The reported blocks 0 needs additional 68 blocks to reach the threshold 1.0000 of total blocks 67. The number of live datanodes 3 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:1359) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4010) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:1102) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:630) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2313) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2309) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2307) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:543) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:233) at org.apache.hadoop.util.RunJar.main(RunJar.java:148) Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException): Cannot create directory /tmp/hive/ambari-qa/1be41c49-9952-4114-81f3-3e78f977a49e. Name node is in safe mode. The reported blocks 0 needs additional 68 blocks to reach the threshold 1.0000 of total blocks 67. The number of live datanodes 3 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:1359) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4010) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:1102) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:630) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2313) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2309) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2307) at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1552) at org.apache.hadoop.ipc.Client.call(Client.java:1496) at org.apache.hadoop.ipc.Client.call(Client.java:1396) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) at com.sun.proxy.$Proxy12.mkdirs(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:603) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:278) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:194) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:176) at com.sun.proxy.$Proxy13.mkdirs(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:3061) at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:3031) at org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1162) at org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1158) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:1158) at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:1150) at org.apache.hadoop.hive.ql.session.SessionState.createPath(SessionState.java:671) at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:596) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:529) ... 8 more ) 2017-05-19 02:30:38,794 [OK] [HARD] [HDFS] [datanode_process] (DataNode Process) TCP OK - 0.000s response on port 50010 2017-05-19 02:30:38,804 [OK] [HARD] [HDFS] [datanode_process_percent] (Percent DataNodes Available) affected: [0], total: [3] 2017-05-19 02:30:39,708 [OK] [HARD] [HDFS] [datanode_webui] (DataNode Web UI) HTTP 200 response in 0.000s 2017-05-19 02:31:38,729 [OK] [HARD] [HDFS] [datanode_storage] (DataNode Storage) Remaining Capacity:[63441966592], Total Capacity:[14% Used, 73861871104] 2017-05-19 02:31:38,731 [OK] [HARD] [HDFS] [datanode_heap_usage] (DataNode Heap Usage) Used Heap:[6%, 57.382637 MB], Max Heap: 1004.0 MB 2017-05-19 02:31:38,744 [OK] [HARD] [HDFS] [datanode_storage_percent] (Percent DataNodes With Available Space) affected: [0], total: [3] 2017-05-19 02:33:11,726 [OK] [HARD] [HIVE] [hive_metastore_process] (Hive Metastore Process) Metastore OK - Hive command took 6.848s 2017-05-19 02:33:38,732 [CRITICAL] [HARD] [YARN] [yarn_nodemanager_webui] (NodeManager Web UI) Connection failed to http://nshk-4.openstacklocal:8042 () 2017-05-19 02:33:38,733 [CRITICAL] [HARD] [YARN] [yarn_nodemanager_health] (NodeManager Health) Connection failed to http://nshk-4.openstacklocal:8042/ws/v1/node/info (Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 171, in execute url_response = urllib2.urlopen(query, timeout=connection_timeout) File "/usr/lib64/python2.6/urllib2.py", line 126, in urlopen return _opener.open(url, data, timeout) File "/usr/lib64/python2.6/urllib2.py", line 391, in open response = self._open(req, data) File "/usr/lib64/python2.6/urllib2.py", line 409, in _open '_open', req) File "/usr/lib64/python2.6/urllib2.py", line 369, in _call_chain result = func(*args) File "/usr/lib64/python2.6/urllib2.py", line 1190, in http_open return self.do_open(httplib.HTTPConnection, req) File "/usr/lib64/python2.6/urllib2.py", line 1165, in do_open raise URLError(err) URLError: ) 2017-05-19 02:33:38,738 [CRITICAL] [HARD] [YARN] [yarn_nodemanager_webui_percent] (Percent NodeManagers Available) affected: [1], total: [3] 2017-05-19 02:34:39,979 [OK] [HARD] [YARN] [yarn_nodemanager_webui] (NodeManager Web UI) HTTP 200 response in 0.000s 2017-05-19 02:34:39,981 [OK] [HARD] [YARN] [yarn_nodemanager_health] (NodeManager Health) NodeManager Healthy 2017-05-19 02:34:39,991 [OK] [HARD] [YARN] [yarn_nodemanager_webui_percent] (Percent NodeManagers Available) affected: [0], total: [3] 2017-05-19 03:57:09,728 [CRITICAL] [HARD] [HIVE] [hive_metastore_process] (Hive Metastore Process) Metastore on nshk-1.openstacklocal failed (Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_hive_metastore.py", line 200, in execute timeout_kill_strategy=TerminateStrategy.KILL_PROCESS_TREE, File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 155, in __init__ self.env.run() File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run self.run_action(resource, action) File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action provider_action() File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 262, in action_run tries=self.resource.tries, try_sleep=self.resource.try_sleep) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner result = function(command, **kwargs) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper result = _call(command, **kwargs_copy) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call raise ExecutionFailed(err_msg, code, out, err) ExecutionFailed: Execution of 'export HIVE_CONF_DIR='/usr/hdp/current/hive-metastore/conf' ; hive --hiveconf hive.metastore.uris=thrift://nshk-1.openstacklocal:9083 --hiveconf hive.metastore.client.connect.retry.delay=1 --hiveconf hive.metastore.failure.retries=1 --hiveconf hive.metastore.connect.retries=1 --hiveconf hive.metastore.client.socket.timeout=14 --hiveconf hive.execution.engine=mr -e 'show databases;'' returned 1. log4j:WARN No such property [maxFileSize] in org.apache.log4j.DailyRollingFileAppender. Logging initialized using configuration in file:/etc/hive/2.5.3.0-37/0/hive-log4j.properties Exception in thread "main" java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException): Cannot create directory /tmp/hive/ambari-qa/f042b005-5b85-43e4-b3fc-7353a615d101. Name node is in safe mode. It was turned on manually. Use "hdfs dfsadmin -safemode leave" to turn safe mode off. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:1359) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4010) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:1102) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:630) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2313) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2309) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2307) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:543) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:233) at org.apache.hadoop.util.RunJar.main(RunJar.java:148) Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException): Cannot create directory /tmp/hive/ambari-qa/f042b005-5b85-43e4-b3fc-7353a615d101. Name node is in safe mode. It was turned on manually. Use "hdfs dfsadmin -safemode leave" to turn safe mode off. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:1359) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4010) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:1102) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:630) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2313) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2309) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2307) at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1552) at org.apache.hadoop.ipc.Client.call(Client.java:1496) at org.apache.hadoop.ipc.Client.call(Client.java:1396) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) at com.sun.proxy.$Proxy12.mkdirs(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:603) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:278) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:194) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:176) at com.sun.proxy.$Proxy13.mkdirs(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:3061) at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:3031) at org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1162) at org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1158) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:1158) at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:1150) at org.apache.hadoop.hive.ql.session.SessionState.createPath(SessionState.java:671) at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:596) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:529) ... 8 more ) 2017-05-19 04:03:11,787 [OK] [HARD] [HIVE] [hive_metastore_process] (Hive Metastore Process) Metastore OK - Hive command took 7.186s 2017-05-19 07:57:12,711 [CRITICAL] [HARD] [HIVE] [hive_metastore_process] (Hive Metastore Process) Metastore on nshk-1.openstacklocal failed (Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_hive_metastore.py", line 200, in execute timeout_kill_strategy=TerminateStrategy.KILL_PROCESS_TREE, File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 155, in __init__ self.env.run() File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run self.run_action(resource, action) File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action provider_action() File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 262, in action_run tries=self.resource.tries, try_sleep=self.resource.try_sleep) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner result = function(command, **kwargs) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper result = _call(command, **kwargs_copy) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call raise ExecutionFailed(err_msg, code, out, err) ExecutionFailed: Execution of 'export HIVE_CONF_DIR='/usr/hdp/current/hive-metastore/conf' ; hive --hiveconf hive.metastore.uris=thrift://nshk-1.openstacklocal:9083 --hiveconf hive.metastore.client.connect.retry.delay=1 --hiveconf hive.metastore.failure.retries=1 --hiveconf hive.metastore.connect.retries=1 --hiveconf hive.metastore.client.socket.timeout=14 --hiveconf hive.execution.engine=mr -e 'show databases;'' returned 1. log4j:WARN No such property [maxFileSize] in org.apache.log4j.DailyRollingFileAppender. Logging initialized using configuration in file:/etc/hive/2.5.3.0-37/0/hive-log4j.properties Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:543) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:233) at org.apache.hadoop.util.RunJar.main(RunJar.java:148) Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1551) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:89) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:135) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:107) at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3252) at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3271) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:524) ... 8 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1549) ... 14 more Caused by: MetaException(message:Could not connect to meta store using any of the URIs provided. Most recent failure: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused (Connection refused) at org.apache.thrift.transport.TSocket.open(TSocket.java:226) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:446) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:244) at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:74) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1549) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:89) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:135) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:107) at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3252) at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3271) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:524) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:233) at org.apache.hadoop.util.RunJar.main(RunJar.java:148) Caused by: java.net.ConnectException: Connection refused (Connection refused) at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at org.apache.thrift.transport.TSocket.open(TSocket.java:221) ... 22 more ) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:492) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:244) at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:74) ... 19 more ) 2017-05-19 08:00:11,719 [OK] [HARD] [HIVE] [hive_metastore_process] (Hive Metastore Process) Metastore OK - Hive command took 7.058s 2017-05-19 08:13:28,752 [CRITICAL] [HARD] [HDFS] [datanode_process] (DataNode Process) Connection failed: [Errno 111] Connection refused to nshk-3.openstacklocal:50010 2017-05-19 08:13:28,757 [UNKNOWN] [HARD] [HDFS] [datanode_storage] (DataNode Storage) [Alert][datanode_storage] Unable to extract JSON from JMX response 2017-05-19 08:13:28,759 [CRITICAL] [HARD] [HDFS] [datanode_webui] (DataNode Web UI) Connection failed to http://nshk-3.openstacklocal:50075 () 2017-05-19 08:13:28,761 [UNKNOWN] [HARD] [HDFS] [datanode_heap_usage] (DataNode Heap Usage) [Alert][datanode_heap_usage] Unable to extract JSON from JMX response 2017-05-19 08:13:28,768 [CRITICAL] [HARD] [HDFS] [datanode_process_percent] (Percent DataNodes Available) affected: [1], total: [3] 2017-05-19 08:13:28,774 [UNKNOWN] [HARD] [HDFS] [datanode_storage_percent] (Percent DataNodes With Available Space) There are alerts with a state of UNKNOWN. 2017-05-19 08:13:39,728 [UNKNOWN] [HARD] [HDFS] [datanode_storage] (DataNode Storage) [Alert][datanode_storage] Unable to extract JSON from JMX response 2017-05-19 08:14:28,762 [OK] [HARD] [HDFS] [datanode_webui] (DataNode Web UI) HTTP 200 response in 0.000s 2017-05-19 08:14:28,762 [OK] [HARD] [HDFS] [datanode_process] (DataNode Process) TCP OK - 0.000s response on port 50010 2017-05-19 08:14:28,773 [OK] [HARD] [HDFS] [datanode_process_percent] (Percent DataNodes Available) affected: [0], total: [3] 2017-05-19 08:15:04,767 [UNKNOWN] [HARD] [YARN] [nodemanager_health_summary] (NodeManager Health Summary) Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanagers_summary.py", line 155, in execute "LiveNodeManagers", connection_timeout)) File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanagers_summary.py", line 195, in get_value_from_jmx response = url_opener.open(query, timeout=connection_timeout) File "/usr/lib64/python2.6/urllib2.py", line 391, in open response = self._open(req, data) File "/usr/lib64/python2.6/urllib2.py", line 409, in _open '_open', req) File "/usr/lib64/python2.6/urllib2.py", line 369, in _call_chain result = func(*args) File "/usr/lib64/python2.6/urllib2.py", line 1190, in http_open return self.do_open(httplib.HTTPConnection, req) File "/usr/lib64/python2.6/urllib2.py", line 1165, in do_open raise URLError(err) URLError: 2017-05-19 08:15:04,767 [CRITICAL] [HARD] [YARN] [yarn_resourcemanager_webui] (ResourceManager Web UI) Connection failed to http://nshk-1.openstacklocal:8088 () 2017-05-19 08:15:04,770 [CRITICAL] [HARD] [MAPREDUCE2] [mapreduce_history_server_process] (History Server Process) Connection failed: [Errno 111] Connection refused to nshk-1.openstacklocal:19888 2017-05-19 08:15:04,770 [CRITICAL] [HARD] [MAPREDUCE2] [mapreduce_history_server_webui] (History Server Web UI) Connection failed to http://nshk-1.openstacklocal:19888 () 2017-05-19 08:15:28,778 [OK] [HARD] [HDFS] [datanode_storage] (DataNode Storage) Remaining Capacity:[61880923648], Total Capacity:[16% Used, 73861871104] 2017-05-19 08:15:28,783 [OK] [HARD] [HDFS] [datanode_heap_usage] (DataNode Heap Usage) Used Heap:[8%, 82.1604 MB], Max Heap: 1004.0 MB 2017-05-19 08:15:39,741 [OK] [HARD] [HDFS] [datanode_storage] (DataNode Storage) Remaining Capacity:[61017716224], Total Capacity:[17% Used, 73861871104] 2017-05-19 08:15:39,757 [OK] [HARD] [HDFS] [datanode_storage_percent] (Percent DataNodes With Available Space) affected: [0], total: [3] 2017-05-19 08:16:03,739 [OK] [HARD] [MAPREDUCE2] [mapreduce_history_server_process] (History Server Process) TCP OK - 0.000s response on port 19888 2017-05-19 08:16:04,713 [OK] [HARD] [YARN] [nodemanager_health_summary] (NodeManager Health Summary) All NodeManagers are healthy 2017-05-19 08:16:04,713 [OK] [HARD] [MAPREDUCE2] [mapreduce_history_server_webui] (History Server Web UI) HTTP 200 response in 0.000s 2017-05-19 08:16:04,715 [OK] [HARD] [YARN] [yarn_resourcemanager_webui] (ResourceManager Web UI) HTTP 200 response in 0.000s 2017-05-19 10:09:04,893 [UNKNOWN] [HARD] [HDFS] [namenode_client_rpc_processing_latency_daily] (NameNode Client RPC Processing Latency (Daily)) Properties file doesn't contain namenode.sink.timeline.collector.hosts. Can't define metric collector hosts 2017-05-19 10:09:04,894 [UNKNOWN] [HARD] [HDFS] [namenode_client_rpc_queue_latency_daily] (NameNode Client RPC Queue Latency (Daily)) Properties file doesn't contain namenode.sink.timeline.collector.hosts. Can't define metric collector hosts 2017-05-19 10:09:04,895 [UNKNOWN] [HARD] [HDFS] [increase_nn_heap_usage_daily] (NameNode Heap Usage (Daily)) Properties file doesn't contain namenode.sink.timeline.collector.hosts. Can't define metric collector hosts 2017-05-19 10:09:04,896 [UNKNOWN] [HARD] [HDFS] [namenode_increase_in_storage_capacity_usage_daily] (HDFS Storage Capacity Usage (Daily)) Properties file doesn't contain namenode.sink.timeline.collector.hosts. Can't define metric collector hosts 2017-05-19 11:14:03,724 [CRITICAL] [HARD] [SPARK] [SPARK_JOBHISTORYSERVER_PROCESS] (Spark History Server) Connection failed: [Errno 111] Connection refused to nshk-1.openstacklocal:18080 2017-05-19 11:14:03,724 [CRITICAL] [HARD] [ZEPPELIN] [zeppelin_server_status] (Zeppelin Server Status) Zeppelin is not running 2017-05-19 11:14:04,738 [CRITICAL] [HARD] [HIVE] [hive_webhcat_server_status] (WebHCat Server Status) Connection failed to http://nshk-1.openstacklocal:50111/templeton/v1/status?user.name=ambari-qa + Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_webhcat_server.py", line 190, in execute url_response = urllib2.urlopen(query_url, timeout=connection_timeout) File "/usr/lib64/python2.6/urllib2.py", line 126, in urlopen return _opener.open(url, data, timeout) File "/usr/lib64/python2.6/urllib2.py", line 391, in open response = self._open(req, data) File "/usr/lib64/python2.6/urllib2.py", line 409, in _open '_open', req) File "/usr/lib64/python2.6/urllib2.py", line 369, in _call_chain result = func(*args) File "/usr/lib64/python2.6/urllib2.py", line 1190, in http_open return self.do_open(httplib.HTTPConnection, req) File "/usr/lib64/python2.6/urllib2.py", line 1165, in do_open raise URLError(err) URLError: 2017-05-19 11:14:28,735 [CRITICAL] [HARD] [YARN] [yarn_nodemanager_webui] (NodeManager Web UI) Connection failed to http://nshk-3.openstacklocal:8042 () 2017-05-19 11:14:28,738 [CRITICAL] [HARD] [YARN] [yarn_nodemanager_health] (NodeManager Health) Connection failed to http://nshk-3.openstacklocal:8042/ws/v1/node/info (Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 171, in execute url_response = urllib2.urlopen(query, timeout=connection_timeout) File "/usr/lib64/python2.6/urllib2.py", line 126, in urlopen return _opener.open(url, data, timeout) File "/usr/lib64/python2.6/urllib2.py", line 391, in open response = self._open(req, data) File "/usr/lib64/python2.6/urllib2.py", line 409, in _open '_open', req) File "/usr/lib64/python2.6/urllib2.py", line 369, in _call_chain result = func(*args) File "/usr/lib64/python2.6/urllib2.py", line 1190, in http_open return self.do_open(httplib.HTTPConnection, req) File "/usr/lib64/python2.6/urllib2.py", line 1165, in do_open raise URLError(err) URLError: ) 2017-05-19 11:14:28,746 [CRITICAL] [HARD] [YARN] [yarn_nodemanager_webui_percent] (Percent NodeManagers Available) affected: [1], total: [3] 2017-05-19 11:14:38,724 [CRITICAL] [HARD] [YARN] [yarn_nodemanager_webui] (NodeManager Web UI) Connection failed to http://nshk-4.openstacklocal:8042 () 2017-05-19 11:14:38,729 [CRITICAL] [HARD] [YARN] [yarn_nodemanager_health] (NodeManager Health) Connection failed to http://nshk-4.openstacklocal:8042/ws/v1/node/info (Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 171, in execute url_response = urllib2.urlopen(query, timeout=connection_timeout) File "/usr/lib64/python2.6/urllib2.py", line 126, in urlopen return _opener.open(url, data, timeout) File "/usr/lib64/python2.6/urllib2.py", line 391, in open response = self._open(req, data) File "/usr/lib64/python2.6/urllib2.py", line 409, in _open '_open', req) File "/usr/lib64/python2.6/urllib2.py", line 369, in _call_chain result = func(*args) File "/usr/lib64/python2.6/urllib2.py", line 1190, in http_open return self.do_open(httplib.HTTPConnection, req) File "/usr/lib64/python2.6/urllib2.py", line 1165, in do_open raise URLError(err) URLError: ) 2017-05-19 11:15:04,730 [CRITICAL] [HARD] [YARN] [yarn_resourcemanager_webui] (ResourceManager Web UI) Connection failed to http://nshk-1.openstacklocal:8088 () 2017-05-19 11:15:04,730 [UNKNOWN] [HARD] [YARN] [nodemanager_health_summary] (NodeManager Health Summary) Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanagers_summary.py", line 155, in execute "LiveNodeManagers", connection_timeout)) File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanagers_summary.py", line 195, in get_value_from_jmx response = url_opener.open(query, timeout=connection_timeout) File "/usr/lib64/python2.6/urllib2.py", line 391, in open response = self._open(req, data) File "/usr/lib64/python2.6/urllib2.py", line 409, in _open '_open', req) File "/usr/lib64/python2.6/urllib2.py", line 369, in _call_chain result = func(*args) File "/usr/lib64/python2.6/urllib2.py", line 1190, in http_open return self.do_open(httplib.HTTPConnection, req) File "/usr/lib64/python2.6/urllib2.py", line 1165, in do_open raise URLError(err) URLError: 2017-05-19 11:15:04,735 [CRITICAL] [HARD] [YARN] [yarn_app_timeline_server_webui] (App Timeline Web UI) Connection failed to http://nshk-1.openstacklocal:8188/ws/v1/timeline () 2017-05-19 11:15:04,735 [CRITICAL] [HARD] [MAPREDUCE2] [mapreduce_history_server_webui] (History Server Web UI) Connection failed to http://nshk-1.openstacklocal:19888 () 2017-05-19 11:15:04,736 [CRITICAL] [HARD] [MAPREDUCE2] [mapreduce_history_server_process] (History Server Process) Connection failed: [Errno 111] Connection refused to nshk-1.openstacklocal:19888 2017-05-19 11:15:07,707 [CRITICAL] [HARD] [HIVE] [hive_server_process] (HiveServer2 Process) Connection failed on host nshk-1.openstacklocal:10000 (Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_hive_thrift_port.py", line 211, in execute ldap_password=ldap_password) File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/hive_check.py", line 79, in check_thrift_port_sasl timeout=check_command_timeout) File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 155, in __init__ self.env.run() File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run self.run_action(resource, action) File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action provider_action() File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 262, in action_run tries=self.resource.tries, try_sleep=self.resource.try_sleep) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner result = function(command, **kwargs) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper result = _call(command, **kwargs_copy) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call raise ExecutionFailed(err_msg, code, out, err) ExecutionFailed: Execution of '! beeline -u 'jdbc:hive2://nshk-1.openstacklocal:10000/;transportMode=binary' -e '' 2>&1| awk '{print}'|grep -i -e 'Connection refused' -e 'Invalid URL'' returned 1. Error: Could not open client transport with JDBC Uri: jdbc:hive2://nshk-1.openstacklocal:10000/;transportMode=binary: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0) Error: Could not open client transport with JDBC Uri: jdbc:hive2://nshk-1.openstacklocal:10000/;transportMode=binary: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0) ) 2017-05-19 11:15:09,710 [CRITICAL] [HARD] [HIVE] [hive_metastore_process] (Hive Metastore Process) Metastore on nshk-1.openstacklocal failed (Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_hive_metastore.py", line 200, in execute timeout_kill_strategy=TerminateStrategy.KILL_PROCESS_TREE, File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 155, in __init__ self.env.run() File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run self.run_action(resource, action) File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action provider_action() File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 262, in action_run tries=self.resource.tries, try_sleep=self.resource.try_sleep) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner result = function(command, **kwargs) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper result = _call(command, **kwargs_copy) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call raise ExecutionFailed(err_msg, code, out, err) ExecutionFailed: Execution of 'export HIVE_CONF_DIR='/usr/hdp/current/hive-metastore/conf' ; hive --hiveconf hive.metastore.uris=thrift://nshk-1.openstacklocal:9083 --hiveconf hive.metastore.client.connect.retry.delay=1 --hiveconf hive.metastore.failure.retries=1 --hiveconf hive.metastore.connect.retries=1 --hiveconf hive.metastore.client.socket.timeout=14 --hiveconf hive.execution.engine=mr -e 'show databases;'' returned 1. log4j:WARN No such property [maxFileSize] in org.apache.log4j.DailyRollingFileAppender. Logging initialized using configuration in file:/etc/hive/2.5.3.0-37/0/hive-log4j.properties Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:543) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:233) at org.apache.hadoop.util.RunJar.main(RunJar.java:148) Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1551) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:89) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:135) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:107) at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3252) at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3271) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:524) ... 8 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1549) ... 14 more Caused by: MetaException(message:Could not connect to meta store using any of the URIs provided. Most recent failure: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused (Connection refused) at org.apache.thrift.transport.TSocket.open(TSocket.java:226) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:446) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:244) at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:74) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1549) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:89) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:135) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:107) at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3252) at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3271) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:524) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:233) at org.apache.hadoop.util.RunJar.main(RunJar.java:148) Caused by: java.net.ConnectException: Connection refused (Connection refused) at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at org.apache.thrift.transport.TSocket.open(TSocket.java:221) ... 22 more ) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:492) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:244) at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:74) ... 19 more ) 2017-05-19 11:15:15,721 [CRITICAL] [HARD] [YARN] [yarn_nodemanager_webui] (NodeManager Web UI) Connection failed to http://nshk-2.openstacklocal:8042 () 2017-05-19 11:15:15,726 [CRITICAL] [HARD] [YARN] [yarn_nodemanager_health] (NodeManager Health) Connection failed to http://nshk-2.openstacklocal:8042/ws/v1/node/info (Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 171, in execute url_response = urllib2.urlopen(query, timeout=connection_timeout) File "/usr/lib64/python2.6/urllib2.py", line 126, in urlopen return _opener.open(url, data, timeout) File "/usr/lib64/python2.6/urllib2.py", line 391, in open response = self._open(req, data) File "/usr/lib64/python2.6/urllib2.py", line 409, in _open '_open', req) File "/usr/lib64/python2.6/urllib2.py", line 369, in _call_chain result = func(*args) File "/usr/lib64/python2.6/urllib2.py", line 1190, in http_open return self.do_open(httplib.HTTPConnection, req) File "/usr/lib64/python2.6/urllib2.py", line 1165, in do_open raise URLError(err) URLError: ) 2017-05-19 11:19:04,749 [UNKNOWN] [HARD] [MAPREDUCE2] [mapreduce_history_server_cpu] (History Server CPU Utilization) [Alert][mapreduce_history_server_cpu] Unable to extract JSON from JMX response 2017-05-19 11:19:04,750 [UNKNOWN] [HARD] [MAPREDUCE2] [mapreduce_history_server_rpc_latency] (History Server RPC Latency) [Alert][mapreduce_history_server_rpc_latency] Unable to extract JSON from JMX response 2017-05-19 11:19:04,752 [UNKNOWN] [HARD] [YARN] [yarn_resourcemanager_rpc_latency] (ResourceManager RPC Latency) [Alert][yarn_resourcemanager_rpc_latency] Unable to extract JSON from JMX response 2017-05-19 11:19:04,753 [UNKNOWN] [HARD] [YARN] [yarn_resourcemanager_cpu] (ResourceManager CPU Utilization) [Alert][yarn_resourcemanager_cpu] Unable to extract JSON from JMX response 2017-05-19 11:30:29,806 [CRITICAL] [HARD] [HDFS] [datanode_process] (DataNode Process) Connection failed: [Errno 111] Connection refused to nshk-3.openstacklocal:50010 2017-05-19 11:30:29,808 [CRITICAL] [HARD] [HDFS] [datanode_webui] (DataNode Web UI) Connection failed to http://nshk-3.openstacklocal:50075 () 2017-05-19 11:30:29,822 [CRITICAL] [HARD] [HDFS] [datanode_process_percent] (Percent DataNodes Available) affected: [1], total: [3] 2017-05-19 11:30:38,752 [CRITICAL] [HARD] [HDFS] [datanode_webui] (DataNode Web UI) Connection failed to http://nshk-4.openstacklocal:50075 () 2017-05-19 11:30:38,752 [CRITICAL] [HARD] [HDFS] [datanode_process] (DataNode Process) Connection failed: [Errno 111] Connection refused to nshk-4.openstacklocal:50010 2017-05-19 11:31:04,771 [UNKNOWN] [HARD] [HDFS] [namenode_last_checkpoint] (NameNode Last Checkpoint) Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/alerts/alert_checkpoint_time.py", line 203, in execute "LastCheckpointTime", connection_timeout)) File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/alerts/alert_checkpoint_time.py", line 246, in get_value_from_jmx response = urllib2.urlopen(query, timeout=connection_timeout) File "/usr/lib64/python2.6/urllib2.py", line 126, in urlopen return _opener.open(url, data, timeout) File "/usr/lib64/python2.6/urllib2.py", line 391, in open response = self._open(req, data) File "/usr/lib64/python2.6/urllib2.py", line 409, in _open '_open', req) File "/usr/lib64/python2.6/urllib2.py", line 369, in _call_chain result = func(*args) File "/usr/lib64/python2.6/urllib2.py", line 1190, in http_open return self.do_open(httplib.HTTPConnection, req) File "/usr/lib64/python2.6/urllib2.py", line 1165, in do_open raise URLError(err) URLError: 2017-05-19 11:31:04,772 [CRITICAL] [HARD] [HDFS] [namenode_webui] (NameNode Web UI) Connection failed to http://nshk-1.openstacklocal:50070 () 2017-05-19 11:31:04,774 [UNKNOWN] [HARD] [HDFS] [datanode_health_summary] (DataNode Health Summary) [Alert][datanode_health_summary] Unable to extract JSON from JMX response 2017-05-19 11:31:04,775 [UNKNOWN] [HARD] [HDFS] [upgrade_finalized_state] (HDFS Upgrade Finalized State) Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/alerts/alert_upgrade_finalized.py", line 140, in execute "UpgradeFinalized")) File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/alerts/alert_upgrade_finalized.py", line 169, in get_value_from_jmx response = urllib2.urlopen(query, timeout=int(CONNECTION_TIMEOUT_DEFAULT)) File "/usr/lib64/python2.6/urllib2.py", line 126, in urlopen return _opener.open(url, data, timeout) File "/usr/lib64/python2.6/urllib2.py", line 391, in open response = self._open(req, data) File "/usr/lib64/python2.6/urllib2.py", line 409, in _open '_open', req) File "/usr/lib64/python2.6/urllib2.py", line 369, in _call_chain result = func(*args) File "/usr/lib64/python2.6/urllib2.py", line 1190, in http_open return self.do_open(httplib.HTTPConnection, req) File "/usr/lib64/python2.6/urllib2.py", line 1165, in do_open raise URLError(err) URLError: 2017-05-19 11:31:04,777 [UNKNOWN] [HARD] [HDFS] [namenode_hdfs_pending_deletion_blocks] (HDFS Pending Deletion Blocks) [Alert][namenode_hdfs_pending_deletion_blocks] Unable to extract JSON from JMX response 2017-05-19 11:31:04,778 [UNKNOWN] [HARD] [HDFS] [namenode_hdfs_blocks_health] (NameNode Blocks Health) [Alert][namenode_hdfs_blocks_health] Unable to extract JSON from JMX response 2017-05-19 11:31:04,779 [UNKNOWN] [HARD] [HDFS] [namenode_hdfs_capacity_utilization] (HDFS Capacity Utilization) [Alert][namenode_hdfs_capacity_utilization] Unable to extract JSON from JMX response 2017-05-19 11:31:04,781 [UNKNOWN] [HARD] [HDFS] [namenode_rpc_latency] (NameNode RPC Latency) [Alert][namenode_rpc_latency] Unable to extract JSON from JMX response 2017-05-19 11:31:04,781 [UNKNOWN] [HARD] [HDFS] [namenode_directory_status] (NameNode Directory Status) [Alert][namenode_directory_status] Unable to extract JSON from JMX response 2017-05-19 11:31:15,733 [UNKNOWN] [HARD] [HDFS] [datanode_heap_usage] (DataNode Heap Usage) [Alert][datanode_heap_usage] Unable to extract JSON from JMX response 2017-05-19 11:31:15,733 [CRITICAL] [HARD] [HDFS] [datanode_webui] (DataNode Web UI) Connection failed to http://nshk-2.openstacklocal:50075 () 2017-05-19 11:31:15,734 [CRITICAL] [HARD] [HDFS] [secondary_namenode_process] (Secondary NameNode Process) Connection failed to http://nshk-2.openstacklocal:50090 () 2017-05-19 11:31:15,735 [UNKNOWN] [HARD] [HDFS] [datanode_storage] (DataNode Storage) [Alert][datanode_storage] Unable to extract JSON from JMX response 2017-05-19 11:31:15,738 [CRITICAL] [HARD] [ZOOKEEPER] [zookeeper_server_process] (ZooKeeper Server Process) Connection failed: [Errno 111] Connection refused to nshk-2.openstacklocal:2181 2017-05-19 11:31:15,740 [CRITICAL] [HARD] [HDFS] [datanode_process] (DataNode Process) Connection failed: [Errno 111] Connection refused to nshk-2.openstacklocal:50010 2017-05-19 11:31:15,749 [UNKNOWN] [HARD] [HDFS] [datanode_storage_percent] (Percent DataNodes With Available Space) There are alerts with a state of UNKNOWN. 2017-05-19 11:31:31,051 [UNKNOWN] [HARD] [HDFS] [datanode_storage] (DataNode Storage) [Alert][datanode_storage] Unable to extract JSON from JMX response 2017-05-19 11:31:31,053 [CRITICAL] [HARD] [ZOOKEEPER] [zookeeper_server_process] (ZooKeeper Server Process) Connection failed: [Errno 111] Connection refused to nshk-3.openstacklocal:2181 2017-05-19 11:31:31,058 [UNKNOWN] [HARD] [HDFS] [datanode_heap_usage] (DataNode Heap Usage) [Alert][datanode_heap_usage] Unable to extract JSON from JMX response 2017-05-19 11:31:31,069 [WARNING] [HARD] [ZOOKEEPER] [zookeeper_server_process_percent] (Percent ZooKeeper Servers Available) affected: [2], total: [3] 2017-05-19 11:31:38,730 [UNKNOWN] [HARD] [HDFS] [datanode_heap_usage] (DataNode Heap Usage) [Alert][datanode_heap_usage] Unable to extract JSON from JMX response 2017-05-19 11:31:38,730 [UNKNOWN] [HARD] [HDFS] [datanode_storage] (DataNode Storage) [Alert][datanode_storage] Unable to extract JSON from JMX response 2017-05-19 11:32:04,727 [OK] [HARD] [HDFS] [namenode_last_checkpoint] (NameNode Last Checkpoint) Last Checkpoint: [0 hours, 1 minutes, 3 transactions] 2017-05-19 11:32:04,728 [OK] [HARD] [HDFS] [namenode_webui] (NameNode Web UI) HTTP 200 response in 0.000s 2017-05-19 11:32:04,731 [OK] [HARD] [HDFS] [datanode_health_summary] (DataNode Health Summary) All 0 DataNode(s) are healthy 2017-05-19 11:32:04,732 [OK] [HARD] [HDFS] [upgrade_finalized_state] (HDFS Upgrade Finalized State) HDFS cluster is not in the upgrade state 2017-05-19 11:32:04,733 [OK] [HARD] [HDFS] [namenode_directory_status] (NameNode Directory Status) Directories are healthy 2017-05-19 11:32:14,791 [OK] [HARD] [HDFS] [secondary_namenode_process] (Secondary NameNode Process) HTTP 200 response in 0.000s 2017-05-19 11:32:14,791 [OK] [HARD] [ZOOKEEPER] [zookeeper_server_process] (ZooKeeper Server Process) TCP OK - 0.000s response on port 2181 2017-05-19 11:32:14,799 [OK] [HARD] [ZOOKEEPER] [zookeeper_server_process_percent] (Percent ZooKeeper Servers Available) affected: [1], total: [3] 2017-05-19 11:32:29,758 [OK] [HARD] [HDFS] [datanode_webui] (DataNode Web UI) HTTP 200 response in 0.000s 2017-05-19 11:32:29,758 [OK] [HARD] [HDFS] [datanode_process] (DataNode Process) TCP OK - 0.000s response on port 50010 2017-05-19 11:32:29,763 [OK] [HARD] [ZOOKEEPER] [zookeeper_server_process] (ZooKeeper Server Process) TCP OK - 0.000s response on port 2181 2017-05-19 11:32:38,729 [OK] [HARD] [HDFS] [datanode_webui] (DataNode Web UI) HTTP 200 response in 0.000s 2017-05-19 11:32:38,730 [OK] [HARD] [HDFS] [datanode_process] (DataNode Process) TCP OK - 0.000s response on port 50010 2017-05-19 11:33:04,739 [OK] [HARD] [HDFS] [namenode_hdfs_pending_deletion_blocks] (HDFS Pending Deletion Blocks) Pending Deletion Blocks:[0] 2017-05-19 11:33:04,740 [OK] [HARD] [HDFS] [namenode_hdfs_blocks_health] (NameNode Blocks Health) Total Blocks:[111], Missing Blocks:[0] 2017-05-19 11:33:04,745 [OK] [HARD] [HDFS] [namenode_hdfs_capacity_utilization] (HDFS Capacity Utilization) Capacity Used:[1%, 1835737088], Capacity Remaining:[206868537856] 2017-05-19 11:33:04,746 [OK] [HARD] [HDFS] [namenode_rpc_latency] (NameNode RPC Latency) Average Queue Time:[0.5], Average Processing Time:[0.5625] 2017-05-19 11:33:14,737 [OK] [HARD] [HDFS] [datanode_webui] (DataNode Web UI) HTTP 200 response in 0.000s 2017-05-19 11:33:14,737 [OK] [HARD] [HDFS] [datanode_process] (DataNode Process) TCP OK - 0.000s response on port 50010 2017-05-19 11:33:14,741 [OK] [HARD] [HDFS] [datanode_storage] (DataNode Storage) Remaining Capacity:[68786234592], Total Capacity:[7% Used, 73861871104] 2017-05-19 11:33:14,744 [OK] [HARD] [HDFS] [datanode_heap_usage] (DataNode Heap Usage) Used Heap:[6%, 55.836456 MB], Max Heap: 1004.0 MB 2017-05-19 11:33:14,773 [OK] [HARD] [HDFS] [datanode_process_percent] (Percent DataNodes Available) affected: [0], total: [3] 2017-05-19 11:33:30,734 [OK] [HARD] [HDFS] [datanode_heap_usage] (DataNode Heap Usage) Used Heap:[6%, 65.01352 MB], Max Heap: 1004.0 MB 2017-05-19 11:33:30,735 [OK] [HARD] [HDFS] [datanode_storage] (DataNode Storage) Remaining Capacity:[68650993710], Total Capacity:[7% Used, 73861871104] 2017-05-19 11:33:39,723 [OK] [HARD] [HDFS] [datanode_storage] (DataNode Storage) Remaining Capacity:[69160506880], Total Capacity:[6% Used, 73861871104] 2017-05-19 11:33:39,725 [OK] [HARD] [HDFS] [datanode_heap_usage] (DataNode Heap Usage) Used Heap:[5%, 49.06418 MB], Max Heap: 1004.0 MB 2017-05-19 11:33:39,728 [OK] [HARD] [HDFS] [datanode_storage_percent] (Percent DataNodes With Available Space) affected: [0], total: [3] 2017-05-19 11:34:04,750 [OK] [HARD] [HDFS] [namenode_client_rpc_processing_latency_hourly] (NameNode Client RPC Processing Latency (Hourly)) There were no data points above the minimum threshold of 30 seconds 2017-05-19 11:34:04,752 [OK] [HARD] [HDFS] [namenode_client_rpc_queue_latency_hourly] (NameNode Client RPC Queue Latency (Hourly)) There were no data points above the minimum threshold of 30 seconds 2017-05-19 11:35:04,755 [OK] [HARD] [YARN] [yarn_resourcemanager_webui] (ResourceManager Web UI) HTTP 200 response in 0.000s 2017-05-19 11:35:04,756 [OK] [HARD] [YARN] [nodemanager_health_summary] (NodeManager Health Summary) All NodeManagers are healthy 2017-05-19 11:35:04,758 [OK] [HARD] [YARN] [yarn_app_timeline_server_webui] (App Timeline Web UI) HTTP 200 response in 0.000s 2017-05-19 11:35:04,759 [OK] [HARD] [MAPREDUCE2] [mapreduce_history_server_webui] (History Server Web UI) HTTP 200 response in 0.000s 2017-05-19 11:35:04,760 [OK] [HARD] [MAPREDUCE2] [mapreduce_history_server_process] (History Server Process) TCP OK - 0.000s response on port 19888 2017-05-19 11:35:15,718 [OK] [HARD] [YARN] [yarn_nodemanager_webui] (NodeManager Web UI) HTTP 200 response in 0.000s 2017-05-19 11:35:15,720 [OK] [HARD] [YARN] [yarn_nodemanager_health] (NodeManager Health) NodeManager Healthy 2017-05-19 11:35:28,723 [OK] [HARD] [YARN] [yarn_nodemanager_webui] (NodeManager Web UI) HTTP 200 response in 0.000s 2017-05-19 11:35:28,725 [OK] [HARD] [YARN] [yarn_nodemanager_health] (NodeManager Health) NodeManager Healthy 2017-05-19 11:35:39,751 [OK] [HARD] [YARN] [yarn_nodemanager_webui] (NodeManager Web UI) HTTP 200 response in 0.000s 2017-05-19 11:35:39,752 [OK] [HARD] [YARN] [yarn_nodemanager_health] (NodeManager Health) NodeManager Healthy 2017-05-19 11:35:39,759 [OK] [HARD] [YARN] [yarn_nodemanager_webui_percent] (Percent NodeManagers Available) affected: [0], total: [3] 2017-05-19 11:37:05,733 [OK] [HARD] [HIVE] [hive_webhcat_server_status] (WebHCat Server Status) WebHCat status was OK (0.321s response from http://nshk-1.openstacklocal:50111/templeton/v1/status?user.name=ambari-qa) 2017-05-19 11:38:04,735 [OK] [HARD] [SPARK] [SPARK_JOBHISTORYSERVER_PROCESS] (Spark History Server) TCP OK - 0.000s response on port 18080 2017-05-19 11:38:04,736 [OK] [HARD] [ZEPPELIN] [zeppelin_server_status] (Zeppelin Server Status) Successful connection to Zeppelin 2017-05-19 11:39:04,753 [OK] [HARD] [MAPREDUCE2] [mapreduce_history_server_cpu] (History Server CPU Utilization) 4 CPU, load 4.7% 2017-05-19 11:39:04,753 [OK] [HARD] [MAPREDUCE2] [mapreduce_history_server_rpc_latency] (History Server RPC Latency) Average Queue Time:[0.0], Average Processing Time:[0.0] 2017-05-19 11:39:04,754 [OK] [HARD] [YARN] [yarn_resourcemanager_rpc_latency] (ResourceManager RPC Latency) Average Queue Time:[0.0], Average Processing Time:[2.0] 2017-05-19 11:39:04,755 [OK] [HARD] [YARN] [yarn_resourcemanager_cpu] (ResourceManager CPU Utilization) 4 CPU, load 4.7% 2017-05-19 11:39:07,708 [OK] [HARD] [HIVE] [hive_server_process] (HiveServer2 Process) TCP OK - 2.716s response on port 10000 2017-05-19 11:39:11,710 [OK] [HARD] [HIVE] [hive_metastore_process] (Hive Metastore Process) Metastore OK - Hive command took 7.412s 2017-05-19 18:09:06,768 [OK] [HARD] [HDFS] [namenode_client_rpc_queue_latency_daily] (NameNode Client RPC Queue Latency (Daily)) There were no data points above the minimum threshold of 30 seconds 2017-05-19 18:09:06,768 [OK] [HARD] [HDFS] [namenode_client_rpc_processing_latency_daily] (NameNode Client RPC Processing Latency (Daily)) There were no data points above the minimum threshold of 30 seconds 2017-05-19 18:09:06,771 [WARNING] [HARD] [HDFS] [increase_nn_heap_usage_daily] (NameNode Heap Usage (Daily)) The variance for this alert is 63MB which is 34% of the 186MB average (37MB is the limit) 2017-05-19 18:09:06,772 [WARNING] [HARD] [HDFS] [namenode_increase_in_storage_capacity_usage_daily] (HDFS Storage Capacity Usage (Daily)) The variance for this alert is 950,843,960B which is 36% of the 2,626,832,493B average (788,049,748B is the limit) 2017-05-20 10:09:04,767 [OK] [HARD] [HDFS] [namenode_increase_in_storage_capacity_usage_daily] (HDFS Storage Capacity Usage (Daily)) The variance for this alert is 465,470,609B which is within 30% of the 3,602,931,848B average (1,080,879,554B is the limit) 2017-05-21 02:09:04,774 [OK] [HARD] [HDFS] [increase_nn_heap_usage_daily] (NameNode Heap Usage (Daily)) There were no data points above the minimum threshold of 100 seconds