ERROR 2018-10-05 09:03:10,747 script_alert.py:123 - [Alert][oozie_server_status] Failed with result CRITICAL: ["Execution of 'source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status' returned 255. Connection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 1 sec. Retry count = 1\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 2 sec. Retry count = 2\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 4 sec. Retry count = 3\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 8 sec. Retry count = 4\nError: IO_ERROR : java.io.IOException: Error while connecting Oozie server. No of retries = 4. Exception = Connection refused"] ERROR 2018-10-05 09:03:10,747 script_alert.py:123 - [Alert][oozie_server_status] Failed with result CRITICAL: ["Execution of 'source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status' returned 255. Connection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 1 sec. Retry count = 1\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 2 sec. Retry count = 2\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 4 sec. Retry count = 3\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 8 sec. Retry count = 4\nError: IO_ERROR : java.io.IOException: Error while connecting Oozie server. No of retries = 4. Exception = Connection refused"] WARNING 2018-10-05 09:03:54,585 base_alert.py:138 - [Alert][hbase_master_cpu] Unable to execute alert. [Alert][hbase_master_cpu] Unable to extract JSON from JMX response ERROR 2018-10-05 09:03:54,596 script_alert.py:123 - [Alert][hive_webhcat_server_status] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:50111/templeton/v1/status?user.name=ambari-qa + \nTraceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_webhcat_server.py", line 190, in execute\n url_response = urllib2.urlopen(query_url, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n'] ERROR 2018-10-05 09:03:54,596 script_alert.py:123 - [Alert][hive_webhcat_server_status] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:50111/templeton/v1/status?user.name=ambari-qa + \nTraceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_webhcat_server.py", line 190, in execute\n url_response = urllib2.urlopen(query_url, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n'] ERROR 2018-10-05 09:03:54,607 script_alert.py:123 - [Alert][yarn_nodemanager_health] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:8042/ws/v1/node/info (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 171, in execute\n url_response = urllib2.urlopen(query, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n)'] WARNING 2018-10-05 09:03:54,609 base_alert.py:138 - [Alert][yarn_resourcemanager_cpu] Unable to execute alert. [Alert][yarn_resourcemanager_cpu] Unable to extract JSON from JMX response ERROR 2018-10-05 09:03:54,607 script_alert.py:123 - [Alert][yarn_nodemanager_health] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:8042/ws/v1/node/info (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 171, in execute\n url_response = urllib2.urlopen(query, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n)'] WARNING 2018-10-05 09:03:54,614 base_alert.py:138 - [Alert][yarn_resourcemanager_rpc_latency] Unable to execute alert. [Alert][yarn_resourcemanager_rpc_latency] Unable to extract JSON from JMX response WARNING 2018-10-05 09:03:54,627 base_alert.py:138 - [Alert][ams_metrics_collector_hbase_master_cpu] Unable to execute alert. [Alert][ams_metrics_collector_hbase_master_cpu] Unable to extract JSON from JMX response WARNING 2018-10-05 09:03:54,633 base_alert.py:138 - [Alert][zeppelin_server_status] Unable to execute alert. 'NoneType' object has no attribute '__getitem__' WARNING 2018-10-05 09:03:54,639 base_alert.py:138 - [Alert][namenode_hdfs_pending_deletion_blocks] Unable to execute alert. [Alert][namenode_hdfs_pending_deletion_blocks] Unable to extract JSON from JMX response WARNING 2018-10-05 09:03:54,640 base_alert.py:138 - [Alert][namenode_cpu] Unable to execute alert. [Alert][namenode_cpu] Unable to extract JSON from JMX response WARNING 2018-10-05 09:03:54,644 base_alert.py:138 - [Alert][datanode_heap_usage] Unable to execute alert. [Alert][datanode_heap_usage] Unable to extract JSON from JMX response ERROR 2018-10-05 09:03:54,648 script_alert.py:123 - [Alert][datanode_unmounted_data_dir] Failed with result CRITICAL: ['The following data dir(s) were not found: /hadoop/hdfs/data\n/root/hadoop/hdfs/data\n'] WARNING 2018-10-05 09:03:54,651 base_alert.py:138 - [Alert][datanode_health_summary] Unable to execute alert. [Alert][datanode_health_summary] Unable to extract JSON from JMX response ERROR 2018-10-05 09:03:54,648 script_alert.py:123 - [Alert][datanode_unmounted_data_dir] Failed with result CRITICAL: ['The following data dir(s) were not found: /hadoop/hdfs/data\n/root/hadoop/hdfs/data\n'] WARNING 2018-10-05 09:03:54,655 base_alert.py:138 - [Alert][namenode_hdfs_blocks_health] Unable to execute alert. [Alert][namenode_hdfs_blocks_health] Unable to extract JSON from JMX response WARNING 2018-10-05 09:03:54,663 base_alert.py:138 - [Alert][datanode_storage] Unable to execute alert. [Alert][datanode_storage] Unable to extract JSON from JMX response WARNING 2018-10-05 09:03:54,665 base_alert.py:138 - [Alert][namenode_service_rpc_processing_latency_hourly] Unable to execute alert. Couldn't define hadoop_conf_dir: argument of type 'NoneType' is not iterable WARNING 2018-10-05 09:03:54,668 base_alert.py:138 - [Alert][namenode_client_rpc_queue_latency_hourly] Unable to execute alert. Couldn't define hadoop_conf_dir: argument of type 'NoneType' is not iterable WARNING 2018-10-05 09:03:54,677 base_alert.py:138 - [Alert][namenode_client_rpc_processing_latency_hourly] Unable to execute alert. Couldn't define hadoop_conf_dir: argument of type 'NoneType' is not iterable WARNING 2018-10-05 09:03:54,679 base_alert.py:138 - [Alert][namenode_directory_status] Unable to execute alert. [Alert][namenode_directory_status] Unable to extract JSON from JMX response WARNING 2018-10-05 09:03:54,682 base_alert.py:138 - [Alert][namenode_service_rpc_queue_latency_hourly] Unable to execute alert. Couldn't define hadoop_conf_dir: argument of type 'NoneType' is not iterable WARNING 2018-10-05 09:03:54,686 base_alert.py:138 - [Alert][namenode_hdfs_capacity_utilization] Unable to execute alert. [Alert][namenode_hdfs_capacity_utilization] Unable to extract JSON from JMX response WARNING 2018-10-05 09:03:54,686 base_alert.py:138 - [Alert][namenode_rpc_latency] Unable to execute alert. [Alert][namenode_rpc_latency] Unable to extract JSON from JMX response WARNING 2018-10-05 09:03:54,719 base_alert.py:138 - [Alert][mapreduce_history_server_rpc_latency] Unable to execute alert. [Alert][mapreduce_history_server_rpc_latency] Unable to extract JSON from JMX response WARNING 2018-10-05 09:03:54,721 base_alert.py:138 - [Alert][mapreduce_history_server_cpu] Unable to execute alert. [Alert][mapreduce_history_server_cpu] Unable to extract JSON from JMX response WARNING 2018-10-05 09:03:54,732 logger.py:71 - Cannot find the stack name in the command. Stack tools cannot be loaded WARNING 2018-10-05 09:03:54,732 logger.py:71 - Cannot find the stack name in the command. Stack tools cannot be loaded INFO 2018-10-05 09:03:54,733 logger.py:75 - call[('ambari-python-wrap', None, 'versions')] {} INFO 2018-10-05 09:03:54,733 logger.py:75 - call[('ambari-python-wrap', None, 'versions')] {} INFO 2018-10-05 09:03:54,741 logger.py:75 - Pid file /var/run/ambari-metrics-monitor/ambari-metrics-monitor.pid is empty or does not exist INFO 2018-10-05 09:03:54,741 logger.py:75 - Pid file /var/run/ambari-metrics-monitor/ambari-metrics-monitor.pid is empty or does not exist ERROR 2018-10-05 09:03:54,742 script_alert.py:123 - [Alert][ams_metrics_monitor_process] Failed with result CRITICAL: ['Ambari Monitor is NOT running on master.hadoop.com'] ERROR 2018-10-05 09:03:54,742 script_alert.py:123 - [Alert][ams_metrics_monitor_process] Failed with result CRITICAL: ['Ambari Monitor is NOT running on master.hadoop.com'] INFO 2018-10-05 09:03:54,759 logger.py:75 - call returned (1, "/bin/ambari-python-wrap: can't find '__main__' module in ''") INFO 2018-10-05 09:03:54,759 logger.py:75 - call returned (1, "/bin/ambari-python-wrap: can't find '__main__' module in ''") ERROR 2018-10-05 09:03:54,760 script_alert.py:123 - [Alert][ambari_agent_version_select] Failed with result CRITICAL: ["hdp-select could not properly read /usr/hdp. Check this directory for unexpected contents.\n/bin/ambari-python-wrap: can't find '__main__' module in ''"] ERROR 2018-10-05 09:03:54,760 script_alert.py:123 - [Alert][ambari_agent_version_select] Failed with result CRITICAL: ["hdp-select could not properly read /usr/hdp. Check this directory for unexpected contents.\n/bin/ambari-python-wrap: can't find '__main__' module in ''"] INFO 2018-10-05 09:03:54,762 logger.py:75 - Execute['source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status'] {'environment': None, 'user': 'oozie'} INFO 2018-10-05 09:03:54,762 logger.py:75 - Execute['source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status'] {'environment': None, 'user': 'oozie'} INFO 2018-10-05 09:03:59,700 Controller.py:304 - Heartbeat (response id = 56358) with server is running... INFO 2018-10-05 09:03:59,700 Controller.py:311 - Building heartbeat message INFO 2018-10-05 09:03:59,705 Heartbeat.py:87 - Adding host info/state to heartbeat message. INFO 2018-10-05 09:03:59,839 logger.py:75 - Testing the JVM's JCE policy to see it if supports an unlimited key length. INFO 2018-10-05 09:03:59,839 logger.py:75 - Testing the JVM's JCE policy to see it if supports an unlimited key length. INFO 2018-10-05 09:04:00,034 Hardware.py:188 - Some mount points were ignored: /dev/shm, /run, /sys/fs/cgroup, /run/user/42, /run/user/0 INFO 2018-10-05 09:04:00,035 Controller.py:320 - Sending Heartbeat (id = 56358) INFO 2018-10-05 09:04:00,037 Controller.py:333 - Heartbeat response received (id = 56359) INFO 2018-10-05 09:04:00,037 Controller.py:342 - Heartbeat interval is 1 seconds INFO 2018-10-05 09:04:00,037 Controller.py:380 - Updating configurations from heartbeat INFO 2018-10-05 09:04:00,037 Controller.py:389 - Adding cancel/execution commands INFO 2018-10-05 09:04:00,037 Controller.py:475 - Waiting 0.9 for next heartbeat INFO 2018-10-05 09:04:00,937 Controller.py:482 - Wait for next heartbeat over ERROR 2018-10-05 09:04:10,799 script_alert.py:123 - [Alert][oozie_server_status] Failed with result CRITICAL: ["Execution of 'source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status' returned 255. Connection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 1 sec. Retry count = 1\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 2 sec. Retry count = 2\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 4 sec. Retry count = 3\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 8 sec. Retry count = 4\nError: IO_ERROR : java.io.IOException: Error while connecting Oozie server. No of retries = 4. Exception = Connection refused"] ERROR 2018-10-05 09:04:10,799 script_alert.py:123 - [Alert][oozie_server_status] Failed with result CRITICAL: ["Execution of 'source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status' returned 255. Connection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 1 sec. Retry count = 1\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 2 sec. Retry count = 2\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 4 sec. Retry count = 3\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 8 sec. Retry count = 4\nError: IO_ERROR : java.io.IOException: Error while connecting Oozie server. No of retries = 4. Exception = Connection refused"] ERROR 2018-10-05 09:04:54,614 script_alert.py:123 - [Alert][hive_webhcat_server_status] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:50111/templeton/v1/status?user.name=ambari-qa + \nTraceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_webhcat_server.py", line 190, in execute\n url_response = urllib2.urlopen(query_url, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n'] ERROR 2018-10-05 09:04:54,614 script_alert.py:123 - [Alert][hive_webhcat_server_status] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:50111/templeton/v1/status?user.name=ambari-qa + \nTraceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_webhcat_server.py", line 190, in execute\n url_response = urllib2.urlopen(query_url, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n'] ERROR 2018-10-05 09:04:54,622 script_alert.py:123 - [Alert][yarn_nodemanager_health] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:8042/ws/v1/node/info (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 171, in execute\n url_response = urllib2.urlopen(query, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n)'] ERROR 2018-10-05 09:04:54,622 script_alert.py:123 - [Alert][yarn_nodemanager_health] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:8042/ws/v1/node/info (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 171, in execute\n url_response = urllib2.urlopen(query, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n)'] WARNING 2018-10-05 09:04:54,638 base_alert.py:138 - [Alert][zeppelin_server_status] Unable to execute alert. 'NoneType' object has no attribute '__getitem__' WARNING 2018-10-05 09:04:54,644 base_alert.py:138 - [Alert][datanode_health_summary] Unable to execute alert. [Alert][datanode_health_summary] Unable to extract JSON from JMX response WARNING 2018-10-05 09:04:54,659 base_alert.py:138 - [Alert][namenode_directory_status] Unable to execute alert. [Alert][namenode_directory_status] Unable to extract JSON from JMX response INFO 2018-10-05 09:04:54,700 logger.py:75 - Execute['export HIVE_CONF_DIR='/usr/hdp/current/hive-metastore/conf' ; hive --hiveconf hive.metastore.uris=thrift://master.hadoop.com:9083 --hiveconf hive.metastore.client.connect.retry.delay=1 --hiveconf hive.metastore.failure.retries=1 --hiveconf hive.metastore.connect.retries=1 --hiveconf hive.metastore.client.socket.timeout=14 --hiveconf hive.execution.engine=mr -e 'show databases;''] {'path': ['/bin/', '/usr/bin/', '/usr/sbin/', u'/usr/hdp/current/hive-metastore/bin'], 'timeout_kill_strategy': 2, 'timeout': 60, 'user': 'ambari-qa'} INFO 2018-10-05 09:04:54,700 logger.py:75 - Execute['export HIVE_CONF_DIR='/usr/hdp/current/hive-metastore/conf' ; hive --hiveconf hive.metastore.uris=thrift://master.hadoop.com:9083 --hiveconf hive.metastore.client.connect.retry.delay=1 --hiveconf hive.metastore.failure.retries=1 --hiveconf hive.metastore.connect.retries=1 --hiveconf hive.metastore.client.socket.timeout=14 --hiveconf hive.execution.engine=mr -e 'show databases;''] {'path': ['/bin/', '/usr/bin/', '/usr/sbin/', u'/usr/hdp/current/hive-metastore/bin'], 'timeout_kill_strategy': 2, 'timeout': 60, 'user': 'ambari-qa'} INFO 2018-10-05 09:04:54,716 logger.py:75 - Execute['! beeline -u 'jdbc:hive2://master.hadoop.com:10000/;transportMode=binary' -e '' 2>&1| awk '{print}'|grep -i -e 'Connection refused' -e 'Invalid URL''] {'path': ['/bin/', '/usr/bin/', '/usr/lib/hive/bin/', '/usr/sbin/'], 'timeout_kill_strategy': 2, 'timeout': 60, 'user': 'ambari-qa'} INFO 2018-10-05 09:04:54,716 logger.py:75 - Execute['! beeline -u 'jdbc:hive2://master.hadoop.com:10000/;transportMode=binary' -e '' 2>&1| awk '{print}'|grep -i -e 'Connection refused' -e 'Invalid URL''] {'path': ['/bin/', '/usr/bin/', '/usr/lib/hive/bin/', '/usr/sbin/'], 'timeout_kill_strategy': 2, 'timeout': 60, 'user': 'ambari-qa'} INFO 2018-10-05 09:04:54,724 logger.py:75 - Pid file /var/run/ambari-metrics-monitor/ambari-metrics-monitor.pid is empty or does not exist INFO 2018-10-05 09:04:54,724 logger.py:75 - Pid file /var/run/ambari-metrics-monitor/ambari-metrics-monitor.pid is empty or does not exist ERROR 2018-10-05 09:04:54,724 script_alert.py:123 - [Alert][ams_metrics_monitor_process] Failed with result CRITICAL: ['Ambari Monitor is NOT running on master.hadoop.com'] ERROR 2018-10-05 09:04:54,724 script_alert.py:123 - [Alert][ams_metrics_monitor_process] Failed with result CRITICAL: ['Ambari Monitor is NOT running on master.hadoop.com'] INFO 2018-10-05 09:04:54,726 logger.py:75 - Execute['source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status'] {'environment': None, 'user': 'oozie'} INFO 2018-10-05 09:04:54,726 logger.py:75 - Execute['source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status'] {'environment': None, 'user': 'oozie'} ERROR 2018-10-05 09:04:59,043 script_alert.py:123 - [Alert][hive_server_process] Failed with result CRITICAL: ['Connection failed on host master.hadoop.com:10000 (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_hive_thrift_port.py", line 212, in execute\n ldap_password=ldap_password)\n File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/hive_check.py", line 81, in check_thrift_port_sasl\n timeout_kill_strategy=TerminateStrategy.KILL_PROCESS_TREE,\n File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 166, in __init__\n self.env.run()\n File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run\n self.run_action(resource, action)\n File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action\n provider_action()\n File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 262, in action_run\n tries=self.resource.tries, try_sleep=self.resource.try_sleep)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner\n result = function(command, **kwargs)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call\n tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper\n result = _call(command, **kwargs_copy)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call\n raise ExecutionFailed(err_msg, code, out, err)\nExecutionFailed: Execution of \'! beeline -u \'jdbc:hive2://master.hadoop.com:10000/;transportMode=binary\' -e \'\' 2>&1| awk \'{print}\'|grep -i -e \'Connection refused\' -e \'Invalid URL\'\' returned 1. Error: Could not open client transport with JDBC Uri: jdbc:hive2://master.hadoop.com:10000/;transportMode=binary: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0)\nError: Could not open client transport with JDBC Uri: jdbc:hive2://master.hadoop.com:10000/;transportMode=binary: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0)\n)'] ERROR 2018-10-05 09:04:59,043 script_alert.py:123 - [Alert][hive_server_process] Failed with result CRITICAL: ['Connection failed on host master.hadoop.com:10000 (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_hive_thrift_port.py", line 212, in execute\n ldap_password=ldap_password)\n File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/hive_check.py", line 81, in check_thrift_port_sasl\n timeout_kill_strategy=TerminateStrategy.KILL_PROCESS_TREE,\n File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 166, in __init__\n self.env.run()\n File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run\n self.run_action(resource, action)\n File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action\n provider_action()\n File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 262, in action_run\n tries=self.resource.tries, try_sleep=self.resource.try_sleep)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner\n result = function(command, **kwargs)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call\n tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper\n result = _call(command, **kwargs_copy)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call\n raise ExecutionFailed(err_msg, code, out, err)\nExecutionFailed: Execution of \'! beeline -u \'jdbc:hive2://master.hadoop.com:10000/;transportMode=binary\' -e \'\' 2>&1| awk \'{print}\'|grep -i -e \'Connection refused\' -e \'Invalid URL\'\' returned 1. Error: Could not open client transport with JDBC Uri: jdbc:hive2://master.hadoop.com:10000/;transportMode=binary: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0)\nError: Could not open client transport with JDBC Uri: jdbc:hive2://master.hadoop.com:10000/;transportMode=binary: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0)\n)'] INFO 2018-10-05 09:05:00,042 Controller.py:304 - Heartbeat (response id = 56424) with server is running... INFO 2018-10-05 09:05:00,043 Controller.py:311 - Building heartbeat message INFO 2018-10-05 09:05:00,045 Heartbeat.py:87 - Adding host info/state to heartbeat message. INFO 2018-10-05 09:05:00,120 logger.py:75 - Testing the JVM's JCE policy to see it if supports an unlimited key length. INFO 2018-10-05 09:05:00,120 logger.py:75 - Testing the JVM's JCE policy to see it if supports an unlimited key length. INFO 2018-10-05 09:05:00,332 Hardware.py:188 - Some mount points were ignored: /dev/shm, /run, /sys/fs/cgroup, /run/user/42, /run/user/0 INFO 2018-10-05 09:05:00,334 Controller.py:320 - Sending Heartbeat (id = 56424) INFO 2018-10-05 09:05:00,336 Controller.py:333 - Heartbeat response received (id = 56425) INFO 2018-10-05 09:05:00,336 Controller.py:342 - Heartbeat interval is 1 seconds INFO 2018-10-05 09:05:00,336 Controller.py:380 - Updating configurations from heartbeat INFO 2018-10-05 09:05:00,336 Controller.py:389 - Adding cancel/execution commands INFO 2018-10-05 09:05:00,337 Controller.py:475 - Waiting 0.9 for next heartbeat INFO 2018-10-05 09:05:01,237 Controller.py:482 - Wait for next heartbeat over ERROR 2018-10-05 09:05:02,443 script_alert.py:123 - [Alert][hive_metastore_process] Failed with result CRITICAL: ['Metastore on master.hadoop.com failed (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_hive_metastore.py", line 203, in execute\n timeout_kill_strategy=TerminateStrategy.KILL_PROCESS_TREE,\n File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 166, in __init__\n self.env.run()\n File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run\n self.run_action(resource, action)\n File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action\n provider_action()\n File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 262, in action_run\n tries=self.resource.tries, try_sleep=self.resource.try_sleep)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner\n result = function(command, **kwargs)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call\n tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper\n result = _call(command, **kwargs_copy)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call\n raise ExecutionFailed(err_msg, code, out, err)\nExecutionFailed: Execution of \'export HIVE_CONF_DIR=\'/usr/hdp/current/hive-metastore/conf\' ; hive --hiveconf hive.metastore.uris=thrift://master.hadoop.com:9083 --hiveconf hive.metastore.client.connect.retry.delay=1 --hiveconf hive.metastore.failure.retries=1 --hiveconf hive.metastore.connect.retries=1 --hiveconf hive.metastore.client.socket.timeout=14 --hiveconf hive.execution.engine=mr -e \'show databases;\'\' returned 1. log4j:WARN No such property [maxFileSize] in org.apache.log4j.DailyRollingFileAppender.\n\nLogging initialized using configuration in file:/etc/hive/2.6.5.0-292/0/hive-log4j.properties\nException in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient\n\tat org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:569)\n\tat org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)\n\tat org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.lang.reflect.Method.invoke(Method.java:498)\n\tat org.apache.hadoop.util.RunJar.run(RunJar.java:233)\n\tat org.apache.hadoop.util.RunJar.main(RunJar.java:148)\nCaused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient\n\tat org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1566)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:92)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:138)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:110)\n\tat org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3556)\n\tat org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3588)\n\tat org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:550)\n\t... 8 more\nCaused by: java.lang.reflect.InvocationTargetException\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)\n\tat sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)\n\tat java.lang.reflect.Constructor.newInstance(Constructor.java:423)\n\tat org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1564)\n\t... 14 more\nCaused by: MetaException(message:Could not connect to meta store using any of the URIs provided. Most recent failure: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused (Connection refused)\n\tat org.apache.thrift.transport.TSocket.open(TSocket.java:226)\n\tat org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:487)\n\tat org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:282)\n\tat org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:76)\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)\n\tat sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)\n\tat java.lang.reflect.Constructor.newInstance(Constructor.java:423)\n\tat org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1564)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:92)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:138)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:110)\n\tat org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3556)\n\tat org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3588)\n\tat org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:550)\n\tat org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)\n\tat org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.lang.reflect.Method.invoke(Method.java:498)\n\tat org.apache.hadoop.util.RunJar.run(RunJar.java:233)\n\tat org.apache.hadoop.util.RunJar.main(RunJar.java:148)\nCaused by: java.net.ConnectException: Connection refused (Connection refused)\n\tat java.net.PlainSocketImpl.socketConnect(Native Method)\n\tat java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)\n\tat java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)\n\tat java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)\n\tat java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)\n\tat java.net.Socket.connect(Socket.java:589)\n\tat org.apache.thrift.transport.TSocket.open(TSocket.java:221)\n\t... 22 more\n)\n\tat org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:534)\n\tat org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:282)\n\tat org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:76)\n\t... 19 more\n)'] ERROR 2018-10-05 09:05:02,443 script_alert.py:123 - [Alert][hive_metastore_process] Failed with result CRITICAL: ['Metastore on master.hadoop.com failed (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_hive_metastore.py", line 203, in execute\n timeout_kill_strategy=TerminateStrategy.KILL_PROCESS_TREE,\n File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 166, in __init__\n self.env.run()\n File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run\n self.run_action(resource, action)\n File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action\n provider_action()\n File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 262, in action_run\n tries=self.resource.tries, try_sleep=self.resource.try_sleep)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner\n result = function(command, **kwargs)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call\n tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper\n result = _call(command, **kwargs_copy)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call\n raise ExecutionFailed(err_msg, code, out, err)\nExecutionFailed: Execution of \'export HIVE_CONF_DIR=\'/usr/hdp/current/hive-metastore/conf\' ; hive --hiveconf hive.metastore.uris=thrift://master.hadoop.com:9083 --hiveconf hive.metastore.client.connect.retry.delay=1 --hiveconf hive.metastore.failure.retries=1 --hiveconf hive.metastore.connect.retries=1 --hiveconf hive.metastore.client.socket.timeout=14 --hiveconf hive.execution.engine=mr -e \'show databases;\'\' returned 1. log4j:WARN No such property [maxFileSize] in org.apache.log4j.DailyRollingFileAppender.\n\nLogging initialized using configuration in file:/etc/hive/2.6.5.0-292/0/hive-log4j.properties\nException in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient\n\tat org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:569)\n\tat org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)\n\tat org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.lang.reflect.Method.invoke(Method.java:498)\n\tat org.apache.hadoop.util.RunJar.run(RunJar.java:233)\n\tat org.apache.hadoop.util.RunJar.main(RunJar.java:148)\nCaused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient\n\tat org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1566)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:92)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:138)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:110)\n\tat org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3556)\n\tat org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3588)\n\tat org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:550)\n\t... 8 more\nCaused by: java.lang.reflect.InvocationTargetException\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)\n\tat sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)\n\tat java.lang.reflect.Constructor.newInstance(Constructor.java:423)\n\tat org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1564)\n\t... 14 more\nCaused by: MetaException(message:Could not connect to meta store using any of the URIs provided. Most recent failure: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused (Connection refused)\n\tat org.apache.thrift.transport.TSocket.open(TSocket.java:226)\n\tat org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:487)\n\tat org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:282)\n\tat org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:76)\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)\n\tat sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)\n\tat java.lang.reflect.Constructor.newInstance(Constructor.java:423)\n\tat org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1564)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:92)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:138)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:110)\n\tat org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3556)\n\tat org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3588)\n\tat org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:550)\n\tat org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)\n\tat org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.lang.reflect.Method.invoke(Method.java:498)\n\tat org.apache.hadoop.util.RunJar.run(RunJar.java:233)\n\tat org.apache.hadoop.util.RunJar.main(RunJar.java:148)\nCaused by: java.net.ConnectException: Connection refused (Connection refused)\n\tat java.net.PlainSocketImpl.socketConnect(Native Method)\n\tat java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)\n\tat java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)\n\tat java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)\n\tat java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)\n\tat java.net.Socket.connect(Socket.java:589)\n\tat org.apache.thrift.transport.TSocket.open(TSocket.java:221)\n\t... 22 more\n)\n\tat org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:534)\n\tat org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:282)\n\tat org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:76)\n\t... 19 more\n)'] ERROR 2018-10-05 09:05:10,614 script_alert.py:123 - [Alert][oozie_server_status] Failed with result CRITICAL: ["Execution of 'source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status' returned 255. Connection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 1 sec. Retry count = 1\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 2 sec. Retry count = 2\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 4 sec. Retry count = 3\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 8 sec. Retry count = 4\nError: IO_ERROR : java.io.IOException: Error while connecting Oozie server. No of retries = 4. Exception = Connection refused"] ERROR 2018-10-05 09:05:10,614 script_alert.py:123 - [Alert][oozie_server_status] Failed with result CRITICAL: ["Execution of 'source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status' returned 255. Connection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 1 sec. Retry count = 1\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 2 sec. Retry count = 2\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 4 sec. Retry count = 3\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 8 sec. Retry count = 4\nError: IO_ERROR : java.io.IOException: Error while connecting Oozie server. No of retries = 4. Exception = Connection refused"] ERROR 2018-10-05 09:05:54,585 script_alert.py:123 - [Alert][hive_webhcat_server_status] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:50111/templeton/v1/status?user.name=ambari-qa + \nTraceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_webhcat_server.py", line 190, in execute\n url_response = urllib2.urlopen(query_url, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n'] ERROR 2018-10-05 09:05:54,585 script_alert.py:123 - [Alert][hive_webhcat_server_status] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:50111/templeton/v1/status?user.name=ambari-qa + \nTraceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_webhcat_server.py", line 190, in execute\n url_response = urllib2.urlopen(query_url, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n'] ERROR 2018-10-05 09:05:54,611 script_alert.py:123 - [Alert][yarn_nodemanager_health] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:8042/ws/v1/node/info (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 171, in execute\n url_response = urllib2.urlopen(query, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n)'] ERROR 2018-10-05 09:05:54,611 script_alert.py:123 - [Alert][yarn_nodemanager_health] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:8042/ws/v1/node/info (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 171, in execute\n url_response = urllib2.urlopen(query, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n)'] WARNING 2018-10-05 09:05:54,618 base_alert.py:138 - [Alert][zeppelin_server_status] Unable to execute alert. 'NoneType' object has no attribute '__getitem__' WARNING 2018-10-05 09:05:54,624 base_alert.py:138 - [Alert][namenode_hdfs_pending_deletion_blocks] Unable to execute alert. [Alert][namenode_hdfs_pending_deletion_blocks] Unable to extract JSON from JMX response WARNING 2018-10-05 09:05:54,625 base_alert.py:138 - [Alert][datanode_heap_usage] Unable to execute alert. [Alert][datanode_heap_usage] Unable to extract JSON from JMX response WARNING 2018-10-05 09:05:54,629 base_alert.py:138 - [Alert][datanode_health_summary] Unable to execute alert. [Alert][datanode_health_summary] Unable to extract JSON from JMX response ERROR 2018-10-05 09:05:54,633 script_alert.py:123 - [Alert][datanode_unmounted_data_dir] Failed with result CRITICAL: ['The following data dir(s) were not found: /hadoop/hdfs/data\n/root/hadoop/hdfs/data\n'] ERROR 2018-10-05 09:05:54,633 script_alert.py:123 - [Alert][datanode_unmounted_data_dir] Failed with result CRITICAL: ['The following data dir(s) were not found: /hadoop/hdfs/data\n/root/hadoop/hdfs/data\n'] WARNING 2018-10-05 09:05:54,634 base_alert.py:138 - [Alert][namenode_hdfs_blocks_health] Unable to execute alert. [Alert][namenode_hdfs_blocks_health] Unable to extract JSON from JMX response WARNING 2018-10-05 09:05:54,649 base_alert.py:138 - [Alert][datanode_storage] Unable to execute alert. [Alert][datanode_storage] Unable to extract JSON from JMX response WARNING 2018-10-05 09:05:54,655 base_alert.py:138 - [Alert][namenode_directory_status] Unable to execute alert. [Alert][namenode_directory_status] Unable to extract JSON from JMX response WARNING 2018-10-05 09:05:54,658 base_alert.py:138 - [Alert][namenode_hdfs_capacity_utilization] Unable to execute alert. [Alert][namenode_hdfs_capacity_utilization] Unable to extract JSON from JMX response WARNING 2018-10-05 09:05:54,659 base_alert.py:138 - [Alert][namenode_rpc_latency] Unable to execute alert. [Alert][namenode_rpc_latency] Unable to extract JSON from JMX response INFO 2018-10-05 09:05:54,717 logger.py:75 - Pid file /var/run/ambari-metrics-monitor/ambari-metrics-monitor.pid is empty or does not exist INFO 2018-10-05 09:05:54,717 logger.py:75 - Pid file /var/run/ambari-metrics-monitor/ambari-metrics-monitor.pid is empty or does not exist ERROR 2018-10-05 09:05:54,717 script_alert.py:123 - [Alert][ams_metrics_monitor_process] Failed with result CRITICAL: ['Ambari Monitor is NOT running on master.hadoop.com'] ERROR 2018-10-05 09:05:54,717 script_alert.py:123 - [Alert][ams_metrics_monitor_process] Failed with result CRITICAL: ['Ambari Monitor is NOT running on master.hadoop.com'] INFO 2018-10-05 09:05:54,720 logger.py:75 - Execute['source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status'] {'environment': None, 'user': 'oozie'} INFO 2018-10-05 09:05:54,720 logger.py:75 - Execute['source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status'] {'environment': None, 'user': 'oozie'} INFO 2018-10-05 09:06:00,391 Controller.py:304 - Heartbeat (response id = 56490) with server is running... INFO 2018-10-05 09:06:00,391 Controller.py:311 - Building heartbeat message INFO 2018-10-05 09:06:00,396 Heartbeat.py:87 - Adding host info/state to heartbeat message. INFO 2018-10-05 09:06:00,549 logger.py:75 - Testing the JVM's JCE policy to see it if supports an unlimited key length. INFO 2018-10-05 09:06:00,549 logger.py:75 - Testing the JVM's JCE policy to see it if supports an unlimited key length. INFO 2018-10-05 09:06:00,796 Hardware.py:188 - Some mount points were ignored: /dev/shm, /run, /sys/fs/cgroup, /run/user/42, /run/user/0 INFO 2018-10-05 09:06:00,797 Controller.py:320 - Sending Heartbeat (id = 56490) INFO 2018-10-05 09:06:00,799 Controller.py:333 - Heartbeat response received (id = 56491) INFO 2018-10-05 09:06:00,799 Controller.py:342 - Heartbeat interval is 1 seconds INFO 2018-10-05 09:06:00,799 Controller.py:380 - Updating configurations from heartbeat INFO 2018-10-05 09:06:00,799 Controller.py:389 - Adding cancel/execution commands INFO 2018-10-05 09:06:00,799 Controller.py:475 - Waiting 0.9 for next heartbeat INFO 2018-10-05 09:06:01,699 Controller.py:482 - Wait for next heartbeat over ERROR 2018-10-05 09:06:10,719 script_alert.py:123 - [Alert][oozie_server_status] Failed with result CRITICAL: ["Execution of 'source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status' returned 255. Connection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 1 sec. Retry count = 1\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 2 sec. Retry count = 2\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 4 sec. Retry count = 3\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 8 sec. Retry count = 4\nError: IO_ERROR : java.io.IOException: Error while connecting Oozie server. No of retries = 4. Exception = Connection refused"] ERROR 2018-10-05 09:06:10,719 script_alert.py:123 - [Alert][oozie_server_status] Failed with result CRITICAL: ["Execution of 'source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status' returned 255. Connection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 1 sec. Retry count = 1\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 2 sec. Retry count = 2\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 4 sec. Retry count = 3\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 8 sec. Retry count = 4\nError: IO_ERROR : java.io.IOException: Error while connecting Oozie server. No of retries = 4. Exception = Connection refused"] ERROR 2018-10-05 09:06:54,610 script_alert.py:123 - [Alert][hive_webhcat_server_status] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:50111/templeton/v1/status?user.name=ambari-qa + \nTraceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_webhcat_server.py", line 190, in execute\n url_response = urllib2.urlopen(query_url, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n'] ERROR 2018-10-05 09:06:54,610 script_alert.py:123 - [Alert][hive_webhcat_server_status] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:50111/templeton/v1/status?user.name=ambari-qa + \nTraceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_webhcat_server.py", line 190, in execute\n url_response = urllib2.urlopen(query_url, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n'] ERROR 2018-10-05 09:06:54,634 script_alert.py:123 - [Alert][yarn_nodemanager_health] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:8042/ws/v1/node/info (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 171, in execute\n url_response = urllib2.urlopen(query, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n)'] WARNING 2018-10-05 09:06:54,636 base_alert.py:138 - [Alert][zeppelin_server_status] Unable to execute alert. 'NoneType' object has no attribute '__getitem__' ERROR 2018-10-05 09:06:54,634 script_alert.py:123 - [Alert][yarn_nodemanager_health] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:8042/ws/v1/node/info (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 171, in execute\n url_response = urllib2.urlopen(query, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n)'] WARNING 2018-10-05 09:06:54,644 base_alert.py:138 - [Alert][datanode_health_summary] Unable to execute alert. [Alert][datanode_health_summary] Unable to extract JSON from JMX response WARNING 2018-10-05 09:06:54,666 base_alert.py:138 - [Alert][namenode_directory_status] Unable to execute alert. [Alert][namenode_directory_status] Unable to extract JSON from JMX response INFO 2018-10-05 09:06:54,728 logger.py:75 - Pid file /var/run/ambari-metrics-monitor/ambari-metrics-monitor.pid is empty or does not exist INFO 2018-10-05 09:06:54,728 logger.py:75 - Pid file /var/run/ambari-metrics-monitor/ambari-metrics-monitor.pid is empty or does not exist ERROR 2018-10-05 09:06:54,728 script_alert.py:123 - [Alert][ams_metrics_monitor_process] Failed with result CRITICAL: ['Ambari Monitor is NOT running on master.hadoop.com'] ERROR 2018-10-05 09:06:54,728 script_alert.py:123 - [Alert][ams_metrics_monitor_process] Failed with result CRITICAL: ['Ambari Monitor is NOT running on master.hadoop.com'] INFO 2018-10-05 09:06:54,734 logger.py:75 - Execute['source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status'] {'environment': None, 'user': 'oozie'} INFO 2018-10-05 09:06:54,734 logger.py:75 - Execute['source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status'] {'environment': None, 'user': 'oozie'} INFO 2018-10-05 09:07:00,836 Controller.py:304 - Heartbeat (response id = 56556) with server is running... INFO 2018-10-05 09:07:00,837 Controller.py:311 - Building heartbeat message INFO 2018-10-05 09:07:00,838 Heartbeat.py:87 - Adding host info/state to heartbeat message. INFO 2018-10-05 09:07:00,897 logger.py:75 - Testing the JVM's JCE policy to see it if supports an unlimited key length. INFO 2018-10-05 09:07:00,897 logger.py:75 - Testing the JVM's JCE policy to see it if supports an unlimited key length. INFO 2018-10-05 09:07:01,080 Hardware.py:188 - Some mount points were ignored: /dev/shm, /run, /sys/fs/cgroup, /run/user/42, /run/user/0 INFO 2018-10-05 09:07:01,081 Controller.py:320 - Sending Heartbeat (id = 56556) INFO 2018-10-05 09:07:01,083 Controller.py:333 - Heartbeat response received (id = 56557) INFO 2018-10-05 09:07:01,083 Controller.py:342 - Heartbeat interval is 1 seconds INFO 2018-10-05 09:07:01,083 Controller.py:380 - Updating configurations from heartbeat INFO 2018-10-05 09:07:01,083 Controller.py:389 - Adding cancel/execution commands INFO 2018-10-05 09:07:01,083 Controller.py:475 - Waiting 0.9 for next heartbeat INFO 2018-10-05 09:07:01,984 Controller.py:482 - Wait for next heartbeat over ERROR 2018-10-05 09:07:10,762 script_alert.py:123 - [Alert][oozie_server_status] Failed with result CRITICAL: ["Execution of 'source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status' returned 255. Connection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 1 sec. Retry count = 1\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 2 sec. Retry count = 2\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 4 sec. Retry count = 3\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 8 sec. Retry count = 4\nError: IO_ERROR : java.io.IOException: Error while connecting Oozie server. No of retries = 4. Exception = Connection refused"] ERROR 2018-10-05 09:07:10,762 script_alert.py:123 - [Alert][oozie_server_status] Failed with result CRITICAL: ["Execution of 'source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status' returned 255. Connection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 1 sec. Retry count = 1\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 2 sec. Retry count = 2\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 4 sec. Retry count = 3\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 8 sec. Retry count = 4\nError: IO_ERROR : java.io.IOException: Error while connecting Oozie server. No of retries = 4. Exception = Connection refused"] INFO 2018-10-05 09:07:28,314 main.py:145 - loglevel=logging.INFO INFO 2018-10-05 09:07:28,315 main.py:145 - loglevel=logging.INFO INFO 2018-10-05 09:07:28,315 main.py:145 - loglevel=logging.INFO INFO 2018-10-05 09:07:28,321 HeartbeatHandlers.py:84 - Ambari-agent received 15 signal, stopping... INFO 2018-10-05 09:07:29,231 HeartbeatHandlers.py:116 - Stop event received INFO 2018-10-05 09:07:29,231 Controller.py:479 - Stop event received INFO 2018-10-05 09:07:29,233 Controller.py:503 - Finished heartbeating and registering cycle INFO 2018-10-05 09:07:29,233 Controller.py:509 - Controller thread has successfully finished INFO 2018-10-05 09:07:29,324 ExitHelper.py:56 - Performing cleanup before exiting... INFO 2018-10-05 09:07:29,325 threadpool.py:120 - Shutting down thread pool INFO 2018-10-05 09:07:29,336 scheduler.py:606 - Scheduler has been shut down INFO 2018-10-05 09:07:29,337 threadpool.py:58 - Started thread pool with 3 core threads and 20 maximum threads INFO 2018-10-05 09:07:29,340 AlertSchedulerHandler.py:185 - [AlertScheduler] Stopped the alert scheduler. INFO 2018-10-05 09:07:29,340 threadpool.py:120 - Shutting down thread pool INFO 2018-10-05 09:07:29,340 ExitHelper.py:70 - Cleanup finished, exiting with code:0 INFO 2018-10-05 09:07:31,346 main.py:283 - Agent died gracefully, exiting. INFO 2018-10-05 09:07:31,347 ExitHelper.py:56 - Performing cleanup before exiting... INFO 2018-10-05 09:07:32,022 main.py:145 - loglevel=logging.INFO INFO 2018-10-05 09:07:32,022 main.py:145 - loglevel=logging.INFO INFO 2018-10-05 09:07:32,022 main.py:145 - loglevel=logging.INFO INFO 2018-10-05 09:07:32,024 DataCleaner.py:39 - Data cleanup thread started INFO 2018-10-05 09:07:32,025 DataCleaner.py:120 - Data cleanup started INFO 2018-10-05 09:07:32,026 hostname.py:67 - agent:hostname_script configuration not defined thus read hostname 'master.hadoop.com' using socket.getfqdn(). INFO 2018-10-05 09:07:32,047 DataCleaner.py:122 - Data cleanup finished INFO 2018-10-05 09:07:32,128 PingPortListener.py:50 - Ping port listener started on port: 8670 INFO 2018-10-05 09:07:32,133 main.py:437 - Connecting to Ambari server at https://master:8440 (10.253.9.66) INFO 2018-10-05 09:07:32,134 NetUtil.py:70 - Connecting to https://master:8440/ca INFO 2018-10-05 09:07:32,262 main.py:447 - Connected to Ambari server master INFO 2018-10-05 09:07:32,340 threadpool.py:58 - Started thread pool with 3 core threads and 20 maximum threads INFO 2018-10-05 09:07:32,371 AlertSchedulerHandler.py:291 - [AlertScheduler] Caching cluster Ambari with alert hash e57151140a3dd93299beda281929c2bd INFO 2018-10-05 09:07:32,393 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,393 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling hbase_master_process with UUID 99dd6617-0e35-489d-be13-2f013a3042f0 INFO 2018-10-05 09:07:32,393 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,394 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling hbase_master_cpu with UUID 680639b5-9e40-442f-ba14-97edeb410140 INFO 2018-10-05 09:07:32,394 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,394 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling hbase_regionserver_process with UUID b6b0d956-5fac-4268-ba2d-30a7b174a5d2 INFO 2018-10-05 09:07:32,394 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,394 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling infra_solr with UUID eb5ebbfc-9941-41e5-a125-f71f7ef0e4d3 INFO 2018-10-05 09:07:32,394 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,394 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling hive_metastore_process with UUID e92dc31a-bc08-4649-a82a-59eb39f0ae18 INFO 2018-10-05 09:07:32,394 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,394 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling hive_server_process with UUID 4fdec797-8738-40ac-bc6a-b207f8131c37 INFO 2018-10-05 09:07:32,394 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,395 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling hive_webhcat_server_status with UUID da41ab24-976f-4fc3-a4cc-d27f31262486 INFO 2018-10-05 09:07:32,395 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,395 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling SPARK2_JOBHISTORYSERVER_PROCESS with UUID 9c530e20-c121-451d-ab14-ab9c89b6d941 INFO 2018-10-05 09:07:32,395 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,395 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling yarn_nodemanager_health with UUID cfb7c7bc-9458-4a4d-96f7-d97a14e8a7f9 INFO 2018-10-05 09:07:32,395 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,395 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling yarn_resourcemanager_webui with UUID c4acf88d-73d4-4de0-8c02-860e069c9be6 INFO 2018-10-05 09:07:32,395 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,395 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling yarn_resourcemanager_cpu with UUID d46bcae0-19e9-4ce5-ae37-979b959742f6 INFO 2018-10-05 09:07:32,395 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,395 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling yarn_nodemanager_webui with UUID a97340bd-5043-4424-93b5-cff72034ffd9 INFO 2018-10-05 09:07:32,396 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,396 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling yarn_resourcemanager_rpc_latency with UUID faace611-c229-4c0f-9be9-dc724ba3eb42 INFO 2018-10-05 09:07:32,396 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,396 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling nodemanager_health_summary with UUID bc681794-3ec8-4513-b298-1d659c2dce77 INFO 2018-10-05 09:07:32,396 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,396 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling yarn_app_timeline_server_webui with UUID 8a8a5e0b-16db-4ffc-bf5c-5d9f11861b47 INFO 2018-10-05 09:07:32,396 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,396 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling ams_metrics_monitor_process with UUID 904e0a1f-dafa-4741-93a1-fd9b34f0f47e INFO 2018-10-05 09:07:32,396 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,396 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling ams_metrics_collector_process with UUID 86a87ced-5f21-41eb-9fd9-1b6d35deda01 INFO 2018-10-05 09:07:32,396 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,396 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling ams_metrics_collector_hbase_master_process with UUID 77c826fa-9de1-4191-8f8d-13c9eb02cb38 INFO 2018-10-05 09:07:32,397 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,397 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling ams_metrics_collector_hbase_master_cpu with UUID 78fd2714-5ac0-4a56-94f2-6a067e913b77 INFO 2018-10-05 09:07:32,397 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,397 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling ams_metrics_collector_autostart with UUID bbfd183c-b3de-4fb9-9388-a9319192e8c0 INFO 2018-10-05 09:07:32,397 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,397 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling grafana_webui with UUID ec2de1fc-1dbb-4743-b07d-928a8fc082a2 INFO 2018-10-05 09:07:32,397 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,397 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling accumulo_tserver_process with UUID f913aa4b-50a1-4bad-bfa4-f28f7d34163d INFO 2018-10-05 09:07:32,397 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,397 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling accumulo_monitor_process with UUID f3c1a8c8-bff0-439a-a702-faadc0aead5d INFO 2018-10-05 09:07:32,397 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,398 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling accumulo_gc_process with UUID 3bba0674-599e-44b7-a2c1-795746149422 INFO 2018-10-05 09:07:32,398 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,398 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling accumulo_tracer_process with UUID 7491229a-051d-48b2-9911-9e6fa878cce6 INFO 2018-10-05 09:07:32,398 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,398 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling accumulo_master_process with UUID 39d5bc6e-7d7b-4e46-90c1-75bbf40b7527 INFO 2018-10-05 09:07:32,398 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,398 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling zeppelin_server_status with UUID ff3560b5-ac71-4d92-ab9d-017028847032 INFO 2018-10-05 09:07:32,398 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,398 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling namenode_cpu with UUID 5c360204-b520-4aef-81d7-355686497ba9 INFO 2018-10-05 09:07:32,398 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,398 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling secondary_namenode_process with UUID ee7ff243-edba-4be2-af4b-cf420697de82 INFO 2018-10-05 09:07:32,398 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,399 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling namenode_hdfs_pending_deletion_blocks with UUID 2c4c7de3-7887-42fb-beb6-2209bf3dd6aa INFO 2018-10-05 09:07:32,399 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,399 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling namenode_client_rpc_queue_latency_daily with UUID d71085be-400d-479f-a73a-9eeb25f20f3e INFO 2018-10-05 09:07:32,399 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,399 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling namenode_ha_health with UUID fba4cdeb-c825-435e-a475-bcb58a7c9ba5 INFO 2018-10-05 09:07:32,399 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,399 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling datanode_heap_usage with UUID 630e225c-3ee6-4102-9ef5-507afb7dd380 INFO 2018-10-05 09:07:32,399 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,399 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling datanode_health_summary with UUID a1bfea70-a617-4b2f-b9c1-58337c21f1d3 INFO 2018-10-05 09:07:32,399 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,399 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling datanode_unmounted_data_dir with UUID 4a56ee0e-08d3-48c2-a22e-cc80e6bade29 INFO 2018-10-05 09:07:32,400 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,400 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling namenode_service_rpc_queue_latency_daily with UUID f9de0732-d350-44e1-a957-ce2bcc4b9080 INFO 2018-10-05 09:07:32,400 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,400 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling datanode_process with UUID 9fef6e81-f1e4-459f-a012-98b6debc8715 INFO 2018-10-05 09:07:32,400 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,400 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling namenode_client_rpc_processing_latency_daily with UUID f16aba1e-9208-4bb7-83c1-856c4a78ba97 INFO 2018-10-05 09:07:32,400 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,400 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling namenode_hdfs_blocks_health with UUID 3783532b-c605-4550-9b00-b7dc3b6dacbf INFO 2018-10-05 09:07:32,400 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,400 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling namenode_webui with UUID 30f53052-754c-4f72-b4f0-59772f4a50c9 INFO 2018-10-05 09:07:32,400 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,401 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling datanode_webui with UUID 73036d66-27ba-40ed-8ce3-6566556abfdc INFO 2018-10-05 09:07:32,401 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,401 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling datanode_storage with UUID 9ee197f8-d90c-4cd3-8f68-a510c3326814 INFO 2018-10-05 09:07:32,401 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,401 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling namenode_service_rpc_processing_latency_hourly with UUID c359bb1e-ae0c-4198-8b7c-0c30728b8bed INFO 2018-10-05 09:07:32,401 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,401 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling namenode_increase_in_storage_capacity_usage_daily with UUID 4c39cefc-6604-4753-b444-69966fe223a4 INFO 2018-10-05 09:07:32,401 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,401 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling namenode_client_rpc_queue_latency_hourly with UUID dfaf65ea-4710-4d6c-8de2-571db50425f7 INFO 2018-10-05 09:07:32,401 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,401 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling namenode_service_rpc_processing_latency_daily with UUID c55f0e0c-b225-42be-8abd-f8cb154e15ed INFO 2018-10-05 09:07:32,401 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,402 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling upgrade_finalized_state with UUID 18638b6e-c597-4945-91cd-28839d3e0379 INFO 2018-10-05 09:07:32,402 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,402 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling namenode_client_rpc_processing_latency_hourly with UUID 28812c41-c0fb-4488-ac44-dc84a09c084a INFO 2018-10-05 09:07:32,402 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,402 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling namenode_increase_in_storage_capacity_usage_weekly with UUID 8e822b31-756d-40e9-b57f-61b5cf76131c INFO 2018-10-05 09:07:32,402 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,402 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling namenode_directory_status with UUID 9fe3703f-e037-4bcf-a7ee-e2855ecf082f INFO 2018-10-05 09:07:32,402 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,402 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling namenode_service_rpc_queue_latency_hourly with UUID 749b3622-b66a-4a90-bb4e-85cf9b2b8405 INFO 2018-10-05 09:07:32,402 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,403 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling increase_nn_heap_usage_weekly with UUID 55e7b131-6822-45ee-b8b3-49715ef9a38c INFO 2018-10-05 09:07:32,403 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,403 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling namenode_hdfs_capacity_utilization with UUID ef18a31d-8add-470f-91c5-8d23b2afdee6 INFO 2018-10-05 09:07:32,403 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,403 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling namenode_rpc_latency with UUID c6263c40-d149-4902-bc60-688c27724b62 INFO 2018-10-05 09:07:32,403 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,403 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling namenode_last_checkpoint with UUID 801246ad-014e-4c54-9467-1aba8054e1d4 INFO 2018-10-05 09:07:32,403 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,403 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling increase_nn_heap_usage_daily with UUID 5c0a38f7-f677-4433-8d1f-f0de5eaff588 INFO 2018-10-05 09:07:32,403 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,403 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling smartsense_server_process with UUID d61d7c08-d36b-450f-bc83-5d61c9ec6068 INFO 2018-10-05 09:07:32,403 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,404 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling smartsense_bundle_failed_or_timedout with UUID 6275d276-45ed-4689-95b0-8f32064f41c0 INFO 2018-10-05 09:07:32,404 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,404 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling smartsense_gateway_status with UUID 8ba60c76-1a06-4946-b946-f33ca5b7dbea INFO 2018-10-05 09:07:32,404 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,404 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling smartsense_long_running_bundle with UUID 17b7e4d0-b35b-4df6-9a9d-5b5931a29941 INFO 2018-10-05 09:07:32,404 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,404 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling oozie_server_webui with UUID c647a987-98ec-4b17-b645-d9342dbd3e8d INFO 2018-10-05 09:07:32,404 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,404 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling oozie_server_status with UUID 01233d6a-74be-4d5d-bb99-5d24ce868e4e INFO 2018-10-05 09:07:32,404 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,404 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling kafka_broker_process with UUID 98c44138-0b5a-4cb3-abac-22c7630d8917 INFO 2018-10-05 09:07:32,405 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,405 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling flume_agent_status with UUID d8e926c9-4f27-4107-8938-22c516f2d81b INFO 2018-10-05 09:07:32,405 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,405 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling falcon_server_webui with UUID 635dee29-ff25-42a4-9221-de330c51d6c2 INFO 2018-10-05 09:07:32,405 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,405 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling falcon_server_process with UUID 89f9353c-3536-4306-8dcf-0572c967d454 INFO 2018-10-05 09:07:32,405 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,405 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling storm_webui with UUID 71f89d6b-070c-4a3b-b96d-f2ac2c15ee46 INFO 2018-10-05 09:07:32,405 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,405 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling storm_supervisor_process with UUID c300b7fe-fe3c-439a-ac5b-efb23bdc3589 INFO 2018-10-05 09:07:32,405 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,405 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling storm_nimbus_process with UUID c6ca0560-30a1-4a36-8301-dcc2d5535ac3 INFO 2018-10-05 09:07:32,406 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,406 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling storm_drpc_server with UUID bf1c74f1-58e8-452f-a859-43d84988dfd7 INFO 2018-10-05 09:07:32,406 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,406 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling mapreduce_history_server_process with UUID d922b922-b983-4f26-b15f-0bb9da096d29 INFO 2018-10-05 09:07:32,406 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,406 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling mapreduce_history_server_webui with UUID d85451c2-cf9c-41da-842b-9e0f1ba57223 INFO 2018-10-05 09:07:32,406 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,406 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling mapreduce_history_server_rpc_latency with UUID 2cfd93a5-5530-4cb5-aa14-78eb5b08c9d5 INFO 2018-10-05 09:07:32,406 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,406 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling mapreduce_history_server_cpu with UUID c958daea-00ce-4765-ab80-dd47faafae84 INFO 2018-10-05 09:07:32,406 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,407 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling metadata_server_webui with UUID cba6b1cd-e91e-4a06-b326-fd5acac267c6 INFO 2018-10-05 09:07:32,407 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,407 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling zookeeper_server_process with UUID fcc22914-a542-4ec3-b5c9-13cc6c49d5e2 INFO 2018-10-05 09:07:32,407 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,407 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling knox_gateway_process with UUID 48fe1bbb-7f19-4df4-b1f0-87c18178184f INFO 2018-10-05 09:07:32,407 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,407 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling SPARK_JOBHISTORYSERVER_PROCESS with UUID a0533d0a-3de1-4219-bcd7-6fc8a764952b INFO 2018-10-05 09:07:32,407 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,407 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling ambari_agent_version_select with UUID 86c502ce-7ace-40d6-a36c-8ce7f829ad40 INFO 2018-10-05 09:07:32,407 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts INFO 2018-10-05 09:07:32,407 AlertSchedulerHandler.py:377 - [AlertScheduler] Scheduling ambari_agent_disk_usage with UUID 9a2d038f-7708-4195-8b98-8a742ba6a4e6 INFO 2018-10-05 09:07:32,407 AlertSchedulerHandler.py:175 - [AlertScheduler] Starting ; currently running: False INFO 2018-10-05 09:07:34,419 hostname.py:106 - Read public hostname 'master.hadoop.com' using socket.getfqdn() INFO 2018-10-05 09:07:34,421 Hardware.py:68 - Initializing host system information. INFO 2018-10-05 09:07:34,438 Hardware.py:188 - Some mount points were ignored: /dev/shm, /run, /sys/fs/cgroup, /run/user/42, /run/user/0 INFO 2018-10-05 09:07:34,470 hostname.py:67 - agent:hostname_script configuration not defined thus read hostname 'master.hadoop.com' using socket.getfqdn(). INFO 2018-10-05 09:07:34,476 Facter.py:202 - Directory: '/etc/resource_overrides' does not exist - it won't be used for gathering system resources. INFO 2018-10-05 09:07:34,486 Hardware.py:73 - Host system information: {'kernel': 'Linux', 'domain': 'hadoop.com', 'physicalprocessorcount': 8, 'kernelrelease': '3.10.0-514.el7.x86_64', 'uptime_days': '0', 'memorytotal': 5892408, 'swapfree': '5.88 GB', 'memorysize': 5892408, 'osfamily': 'redhat', 'swapsize': '5.88 GB', 'processorcount': 8, 'netmask': '255.255.255.0', 'timezone': 'KST', 'hardwareisa': 'x86_64', 'memoryfree': 1377188, 'operatingsystem': 'centos', 'kernelmajversion': '3.10', 'kernelversion': '3.10.0', 'macaddress': 'E8:9A:8F:15:EE:21', 'operatingsystemrelease': '7.3.1611', 'ipaddress': '10.253.9.66', 'hostname': 'master', 'uptime_hours': '14', 'fqdn': 'master.hadoop.com', 'id': 'root', 'architecture': 'x86_64', 'selinux': False, 'mounts': [{'available': '36444268', 'used': '15958932', 'percent': '31%', 'device': '/dev/sda3', 'mountpoint': '/', 'type': 'xfs', 'size': '52403200'}, {'available': '2930948', 'used': '0', 'percent': '0%', 'device': 'devtmpfs', 'mountpoint': '/dev', 'type': 'devtmpfs', 'size': '2930948'}, {'available': '863864', 'used': '174472', 'percent': '17%', 'device': '/dev/sda1', 'mountpoint': '/boot', 'type': 'xfs', 'size': '1038336'}, {'available': '672565756', 'used': '38528', 'percent': '1%', 'device': '/dev/sda5', 'mountpoint': '/home', 'type': 'xfs', 'size': '672604284'}], 'hardwaremodel': 'x86_64', 'uptime_seconds': '52578', 'interfaces': 'enp13s0,lo,virbr0'} INFO 2018-10-05 09:07:34,601 Controller.py:170 - Registering with master.hadoop.com (10.253.9.66) (agent='{"hardwareProfile": {"kernel": "Linux", "domain": "hadoop.com", "physicalprocessorcount": 8, "kernelrelease": "3.10.0-514.el7.x86_64", "uptime_days": "0", "memorytotal": 5892408, "swapfree": "5.88 GB", "memorysize": 5892408, "osfamily": "redhat", "swapsize": "5.88 GB", "processorcount": 8, "netmask": "255.255.255.0", "timezone": "KST", "hardwareisa": "x86_64", "memoryfree": 1377188, "operatingsystem": "centos", "kernelmajversion": "3.10", "kernelversion": "3.10.0", "macaddress": "E8:9A:8F:15:EE:21", "operatingsystemrelease": "7.3.1611", "ipaddress": "10.253.9.66", "hostname": "master", "uptime_hours": "14", "fqdn": "master.hadoop.com", "id": "root", "architecture": "x86_64", "selinux": false, "mounts": [{"available": "36444268", "used": "15958932", "percent": "31%", "device": "/dev/sda3", "mountpoint": "/", "type": "xfs", "size": "52403200"}, {"available": "2930948", "used": "0", "percent": "0%", "device": "devtmpfs", "mountpoint": "/dev", "type": "devtmpfs", "size": "2930948"}, {"available": "863864", "used": "174472", "percent": "17%", "device": "/dev/sda1", "mountpoint": "/boot", "type": "xfs", "size": "1038336"}, {"available": "672565756", "used": "38528", "percent": "1%", "device": "/dev/sda5", "mountpoint": "/home", "type": "xfs", "size": "672604284"}], "hardwaremodel": "x86_64", "uptime_seconds": "52578", "interfaces": "enp13s0,lo,virbr0"}, "currentPingPort": 8670, "prefix": "/var/lib/ambari-agent/data", "agentVersion": "2.6.1.5", "agentEnv": {"transparentHugePage": "", "hostHealth": {"agentTimeStampAtReporting": 1538698054593, "activeJavaProcs": [{"command": "/usr/jdk64/jdk1.8.0_112/bin/java -server -XX:NewRatio=3 -XX:+UseConcMarkSweepGC -XX:-UseGCOverheadLimit -XX:CMSInitiatingOccupancyFraction=60 -Djava.io.tmpdir=/var/lib/smartsense/hst-server/tmp -Dlog.file.name=hst-server.log -Xms1024m -Xmx2048m -cp /etc/hst/conf:/usr/hdp/share/hst/hst-common/lib/* com.hortonworks.support.tools.server.SupportToolServer", "pid": 3053, "hadoop": false, "user": "root"}], "liveServices": [{"status": "Healthy", "name": "ntpd or chronyd", "desc": ""}]}, "reverseLookup": true, "alternatives": [], "hasUnlimitedJcePolicy": null, "umask": "18", "firewallName": "iptables", "stackFoldersAndFiles": [{"type": "directory", "name": "/etc/hadoop"}, {"type": "directory", "name": "/etc/hbase"}, {"type": "directory", "name": "/etc/hive"}, {"type": "directory", "name": "/etc/oozie"}, {"type": "directory", "name": "/etc/zookeeper"}, {"type": "directory", "name": "/etc/flume"}, {"type": "directory", "name": "/etc/storm"}, {"type": "directory", "name": "/etc/hive-hcatalog"}, {"type": "directory", "name": "/etc/tez"}, {"type": "directory", "name": "/etc/falcon"}, {"type": "directory", "name": "/etc/knox"}, {"type": "directory", "name": "/etc/hive-webhcat"}, {"type": "directory", "name": "/etc/kafka"}, {"type": "directory", "name": "/etc/mahout"}, {"type": "directory", "name": "/etc/spark"}, {"type": "directory", "name": "/etc/pig"}, {"type": "directory", "name": "/etc/accumulo"}, {"type": "directory", "name": "/etc/ambari-metrics-collector"}, {"type": "directory", "name": "/etc/ambari-metrics-monitor"}, {"type": "directory", "name": "/etc/atlas"}, {"type": "directory", "name": "/etc/zeppelin"}, {"type": "directory", "name": "/var/log/hbase"}, {"type": "directory", "name": "/var/log/hive"}, {"type": "directory", "name": "/var/log/oozie"}, {"type": "directory", "name": "/var/log/zookeeper"}, {"type": "directory", "name": "/var/log/flume"}, {"type": "directory", "name": "/var/log/storm"}, {"type": "directory", "name": "/var/log/hive-hcatalog"}, {"type": "directory", "name": "/var/log/falcon"}, {"type": "directory", "name": "/var/log/hadoop-hdfs"}, {"type": "directory", "name": "/var/log/hadoop-yarn"}, {"type": "directory", "name": "/var/log/hadoop-mapreduce"}, {"type": "directory", "name": "/var/log/knox"}, {"type": "directory", "name": "/var/log/kafka"}, {"type": "directory", "name": "/var/log/spark"}, {"type": "directory", "name": "/var/log/accumulo"}, {"type": "directory", "name": "/var/log/ambari-metrics-monitor"}, {"type": "directory", "name": "/var/log/zeppelin"}, {"type": "directory", "name": "/usr/lib/flume"}, {"type": "directory", "name": "/usr/lib/storm"}, {"type": "directory", "name": "/usr/lib/ambari-metrics-collector"}, {"type": "directory", "name": "/var/lib/hive"}, {"type": "directory", "name": "/var/lib/oozie"}, {"type": "directory", "name": "/var/lib/zookeeper"}, {"type": "directory", "name": "/var/lib/flume"}, {"type": "directory", "name": "/var/lib/hadoop-hdfs"}, {"type": "directory", "name": "/var/lib/hadoop-yarn"}, {"type": "directory", "name": "/var/lib/hadoop-mapreduce"}, {"type": "directory", "name": "/var/lib/knox"}, {"type": "directory", "name": "/var/lib/spark"}, {"type": "directory", "name": "/var/lib/ambari-metrics-collector"}, {"type": "directory", "name": "/var/lib/zeppelin"}, {"type": "directory", "name": "/var/tmp/oozie"}, {"type": "directory", "name": "/tmp/ambari-qa"}, {"type": "directory", "name": "/hadoop/storm"}, {"type": "directory", "name": "/hadoop/falcon"}], "existingUsers": [{"status": "Available", "name": "hive", "homeDir": "/home/hive"}, {"status": "Available", "name": "atlas", "homeDir": "/home/atlas"}, {"status": "Available", "name": "ams", "homeDir": "/home/ams"}, {"status": "Available", "name": "falcon", "homeDir": "/home/falcon"}, {"status": "Available", "name": "accumulo", "homeDir": "/home/accumulo"}, {"status": "Available", "name": "spark", "homeDir": "/home/spark"}, {"status": "Available", "name": "flume", "homeDir": "/home/flume"}, {"status": "Available", "name": "hbase", "homeDir": "/home/hbase"}, {"status": "Available", "name": "hcat", "homeDir": "/home/hcat"}, {"status": "Available", "name": "storm", "homeDir": "/home/storm"}, {"status": "Available", "name": "zookeeper", "homeDir": "/home/zookeeper"}, {"status": "Available", "name": "oozie", "homeDir": "/home/oozie"}, {"status": "Available", "name": "tez", "homeDir": "/home/tez"}, {"status": "Available", "name": "zeppelin", "homeDir": "/home/zeppelin"}, {"status": "Available", "name": "mahout", "homeDir": "/home/mahout"}, {"status": "Available", "name": "ambari-qa", "homeDir": "/home/ambari-qa"}, {"status": "Available", "name": "kafka", "homeDir": "/home/kafka"}, {"status": "Available", "name": "hdfs", "homeDir": "/home/hdfs"}, {"status": "Available", "name": "sqoop", "homeDir": "/home/sqoop"}, {"status": "Available", "name": "yarn", "homeDir": "/home/yarn"}, {"status": "Available", "name": "mapred", "homeDir": "/home/mapred"}, {"status": "Available", "name": "knox", "homeDir": "/home/knox"}], "firewallRunning": false}, "timestamp": 1538698054488, "hostname": "master.hadoop.com", "responseId": -1, "publicHostname": "master.hadoop.com"}') INFO 2018-10-05 09:07:34,602 NetUtil.py:70 - Connecting to https://master:8440/connection_info INFO 2018-10-05 09:07:34,680 security.py:93 - SSL Connect being called.. connecting to the server INFO 2018-10-05 09:07:34,756 security.py:60 - SSL connection established. Two-way SSL authentication is turned off on the server. INFO 2018-10-05 09:07:35,338 Controller.py:196 - Registration Successful (response id = 0) INFO 2018-10-05 09:07:35,338 ClusterConfiguration.py:119 - Updating cached configurations for cluster Ambari INFO 2018-10-05 09:07:35,393 RecoveryManager.py:577 - RecoverConfig = {u'components': u'METRICS_COLLECTOR', u'maxCount': u'6', u'maxLifetimeCount': u'1024', u'recoveryTimestamp': 1538698054878, u'retryGap': u'5', u'type': u'AUTO_START', u'windowInMinutes': u'60'} INFO 2018-10-05 09:07:35,393 RecoveryManager.py:677 - ==> Auto recovery is enabled with maximum 6 in 60 minutes with gap of 5 minutes between and lifetime max being 1024. Enabled components - METRICS_COLLECTOR INFO 2018-10-05 09:07:35,393 AmbariConfig.py:316 - Updating config property (agent.check.remote.mounts) with value (false) INFO 2018-10-05 09:07:35,393 AmbariConfig.py:316 - Updating config property (agent.auto.cache.update) with value (true) INFO 2018-10-05 09:07:35,394 AmbariConfig.py:316 - Updating config property (java.home) with value (/usr/jdk64/jdk1.8.0_112) INFO 2018-10-05 09:07:35,394 AmbariConfig.py:316 - Updating config property (agent.check.mounts.timeout) with value (0) INFO 2018-10-05 09:07:35,459 AlertSchedulerHandler.py:291 - [AlertScheduler] Caching cluster Ambari with alert hash e57151140a3dd93299beda281929c2bd INFO 2018-10-05 09:07:35,487 AlertSchedulerHandler.py:230 - [AlertScheduler] Reschedule Summary: 0 rescheduled, 0 unscheduled INFO 2018-10-05 09:07:35,488 Controller.py:516 - Registration response from master was OK INFO 2018-10-05 09:07:35,488 Controller.py:521 - Resetting ActionQueue... INFO 2018-10-05 09:07:45,491 Controller.py:304 - Heartbeat (response id = 0) with server is running... INFO 2018-10-05 09:07:45,491 Controller.py:311 - Building heartbeat message INFO 2018-10-05 09:07:45,495 Heartbeat.py:87 - Adding host info/state to heartbeat message. INFO 2018-10-05 09:07:45,630 logger.py:75 - Testing the JVM's JCE policy to see it if supports an unlimited key length. INFO 2018-10-05 09:07:45,901 Hardware.py:188 - Some mount points were ignored: /dev/shm, /run, /sys/fs/cgroup, /run/user/42, /run/user/0 INFO 2018-10-05 09:07:45,907 Controller.py:320 - Sending Heartbeat (id = 0) INFO 2018-10-05 09:07:45,911 Controller.py:333 - Heartbeat response received (id = 1) INFO 2018-10-05 09:07:45,911 Controller.py:342 - Heartbeat interval is 1 seconds INFO 2018-10-05 09:07:45,911 Controller.py:380 - Updating configurations from heartbeat INFO 2018-10-05 09:07:45,911 Controller.py:389 - Adding cancel/execution commands INFO 2018-10-05 09:07:45,911 Controller.py:406 - Adding recovery commands INFO 2018-10-05 09:07:45,912 Controller.py:475 - Waiting 0.9 for next heartbeat INFO 2018-10-05 09:07:46,812 Controller.py:482 - Wait for next heartbeat over ERROR 2018-10-05 09:08:32,438 script_alert.py:123 - [Alert][hive_webhcat_server_status] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:50111/templeton/v1/status?user.name=ambari-qa + \nTraceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_webhcat_server.py", line 190, in execute\n url_response = urllib2.urlopen(query_url, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n'] ERROR 2018-10-05 09:08:32,438 script_alert.py:123 - [Alert][hive_webhcat_server_status] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:50111/templeton/v1/status?user.name=ambari-qa + \nTraceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_webhcat_server.py", line 190, in execute\n url_response = urllib2.urlopen(query_url, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n'] ERROR 2018-10-05 09:08:32,441 script_alert.py:123 - [Alert][yarn_nodemanager_health] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:8042/ws/v1/node/info (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 171, in execute\n url_response = urllib2.urlopen(query, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n)'] ERROR 2018-10-05 09:08:32,441 script_alert.py:123 - [Alert][yarn_nodemanager_health] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:8042/ws/v1/node/info (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 171, in execute\n url_response = urllib2.urlopen(query, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n)'] WARNING 2018-10-05 09:08:32,453 base_alert.py:138 - [Alert][zeppelin_server_status] Unable to execute alert. 'NoneType' object has no attribute '__getitem__' WARNING 2018-10-05 09:08:32,460 base_alert.py:138 - [Alert][datanode_health_summary] Unable to execute alert. [Alert][datanode_health_summary] Unable to extract JSON from JMX response WARNING 2018-10-05 09:08:32,487 base_alert.py:138 - [Alert][namenode_directory_status] Unable to execute alert. [Alert][namenode_directory_status] Unable to extract JSON from JMX response INFO 2018-10-05 09:08:32,525 logger.py:75 - Pid file /var/run/ambari-metrics-monitor/ambari-metrics-monitor.pid is empty or does not exist INFO 2018-10-05 09:08:32,525 logger.py:75 - Pid file /var/run/ambari-metrics-monitor/ambari-metrics-monitor.pid is empty or does not exist ERROR 2018-10-05 09:08:32,525 script_alert.py:123 - [Alert][ams_metrics_monitor_process] Failed with result CRITICAL: ['Ambari Monitor is NOT running on master.hadoop.com'] ERROR 2018-10-05 09:08:32,525 script_alert.py:123 - [Alert][ams_metrics_monitor_process] Failed with result CRITICAL: ['Ambari Monitor is NOT running on master.hadoop.com'] INFO 2018-10-05 09:08:32,541 logger.py:75 - Execute['source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status'] {'environment': None, 'user': 'oozie'} INFO 2018-10-05 09:08:32,541 logger.py:75 - Execute['source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status'] {'environment': None, 'user': 'oozie'} INFO 2018-10-05 09:08:45,969 Controller.py:304 - Heartbeat (response id = 66) with server is running... INFO 2018-10-05 09:08:45,970 Controller.py:311 - Building heartbeat message INFO 2018-10-05 09:08:45,974 Heartbeat.py:87 - Adding host info/state to heartbeat message. INFO 2018-10-05 09:08:46,130 logger.py:75 - Testing the JVM's JCE policy to see it if supports an unlimited key length. INFO 2018-10-05 09:08:46,130 logger.py:75 - Testing the JVM's JCE policy to see it if supports an unlimited key length. INFO 2018-10-05 09:08:46,372 Hardware.py:188 - Some mount points were ignored: /dev/shm, /run, /sys/fs/cgroup, /run/user/42, /run/user/0 INFO 2018-10-05 09:08:46,375 Controller.py:320 - Sending Heartbeat (id = 66) INFO 2018-10-05 09:08:46,379 Controller.py:333 - Heartbeat response received (id = 67) INFO 2018-10-05 09:08:46,379 Controller.py:342 - Heartbeat interval is 1 seconds INFO 2018-10-05 09:08:46,379 Controller.py:380 - Updating configurations from heartbeat INFO 2018-10-05 09:08:46,380 Controller.py:389 - Adding cancel/execution commands INFO 2018-10-05 09:08:46,380 Controller.py:406 - Adding recovery commands INFO 2018-10-05 09:08:46,380 Controller.py:475 - Waiting 0.9 for next heartbeat INFO 2018-10-05 09:08:47,281 Controller.py:482 - Wait for next heartbeat over ERROR 2018-10-05 09:08:48,454 script_alert.py:123 - [Alert][oozie_server_status] Failed with result CRITICAL: ["Execution of 'source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status' returned 255. Connection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 1 sec. Retry count = 1\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 2 sec. Retry count = 2\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 4 sec. Retry count = 3\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 8 sec. Retry count = 4\nError: IO_ERROR : java.io.IOException: Error while connecting Oozie server. No of retries = 4. Exception = Connection refused"] ERROR 2018-10-05 09:08:48,454 script_alert.py:123 - [Alert][oozie_server_status] Failed with result CRITICAL: ["Execution of 'source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status' returned 255. Connection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 1 sec. Retry count = 1\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 2 sec. Retry count = 2\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 4 sec. Retry count = 3\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 8 sec. Retry count = 4\nError: IO_ERROR : java.io.IOException: Error while connecting Oozie server. No of retries = 4. Exception = Connection refused"] ERROR 2018-10-05 09:09:32,454 script_alert.py:123 - [Alert][hive_webhcat_server_status] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:50111/templeton/v1/status?user.name=ambari-qa + \nTraceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_webhcat_server.py", line 190, in execute\n url_response = urllib2.urlopen(query_url, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n'] ERROR 2018-10-05 09:09:32,454 script_alert.py:123 - [Alert][hive_webhcat_server_status] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:50111/templeton/v1/status?user.name=ambari-qa + \nTraceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_webhcat_server.py", line 190, in execute\n url_response = urllib2.urlopen(query_url, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n'] ERROR 2018-10-05 09:09:32,468 script_alert.py:123 - [Alert][yarn_nodemanager_health] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:8042/ws/v1/node/info (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 171, in execute\n url_response = urllib2.urlopen(query, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n)'] ERROR 2018-10-05 09:09:32,468 script_alert.py:123 - [Alert][yarn_nodemanager_health] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:8042/ws/v1/node/info (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 171, in execute\n url_response = urllib2.urlopen(query, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n)'] WARNING 2018-10-05 09:09:32,487 base_alert.py:138 - [Alert][zeppelin_server_status] Unable to execute alert. 'NoneType' object has no attribute '__getitem__' WARNING 2018-10-05 09:09:32,492 base_alert.py:138 - [Alert][namenode_hdfs_pending_deletion_blocks] Unable to execute alert. [Alert][namenode_hdfs_pending_deletion_blocks] Unable to extract JSON from JMX response WARNING 2018-10-05 09:09:32,499 base_alert.py:138 - [Alert][datanode_heap_usage] Unable to execute alert. [Alert][datanode_heap_usage] Unable to extract JSON from JMX response WARNING 2018-10-05 09:09:32,502 base_alert.py:138 - [Alert][datanode_health_summary] Unable to execute alert. [Alert][datanode_health_summary] Unable to extract JSON from JMX response WARNING 2018-10-05 09:09:32,504 base_alert.py:138 - [Alert][namenode_hdfs_blocks_health] Unable to execute alert. [Alert][namenode_hdfs_blocks_health] Unable to extract JSON from JMX response ERROR 2018-10-05 09:09:32,506 script_alert.py:123 - [Alert][datanode_unmounted_data_dir] Failed with result CRITICAL: ['The following data dir(s) were not found: /hadoop/hdfs/data\n/root/hadoop/hdfs/data\n'] ERROR 2018-10-05 09:09:32,506 script_alert.py:123 - [Alert][datanode_unmounted_data_dir] Failed with result CRITICAL: ['The following data dir(s) were not found: /hadoop/hdfs/data\n/root/hadoop/hdfs/data\n'] WARNING 2018-10-05 09:09:32,521 base_alert.py:138 - [Alert][datanode_storage] Unable to execute alert. [Alert][datanode_storage] Unable to extract JSON from JMX response WARNING 2018-10-05 09:09:32,526 base_alert.py:138 - [Alert][namenode_directory_status] Unable to execute alert. [Alert][namenode_directory_status] Unable to extract JSON from JMX response WARNING 2018-10-05 09:09:32,528 base_alert.py:138 - [Alert][namenode_hdfs_capacity_utilization] Unable to execute alert. [Alert][namenode_hdfs_capacity_utilization] Unable to extract JSON from JMX response WARNING 2018-10-05 09:09:32,530 base_alert.py:138 - [Alert][namenode_rpc_latency] Unable to execute alert. [Alert][namenode_rpc_latency] Unable to extract JSON from JMX response INFO 2018-10-05 09:09:32,574 logger.py:75 - Pid file /var/run/ambari-metrics-monitor/ambari-metrics-monitor.pid is empty or does not exist INFO 2018-10-05 09:09:32,574 logger.py:75 - Pid file /var/run/ambari-metrics-monitor/ambari-metrics-monitor.pid is empty or does not exist ERROR 2018-10-05 09:09:32,575 script_alert.py:123 - [Alert][ams_metrics_monitor_process] Failed with result CRITICAL: ['Ambari Monitor is NOT running on master.hadoop.com'] ERROR 2018-10-05 09:09:32,575 script_alert.py:123 - [Alert][ams_metrics_monitor_process] Failed with result CRITICAL: ['Ambari Monitor is NOT running on master.hadoop.com'] INFO 2018-10-05 09:09:32,593 logger.py:75 - Execute['source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status'] {'environment': None, 'user': 'oozie'} INFO 2018-10-05 09:09:32,593 logger.py:75 - Execute['source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status'] {'environment': None, 'user': 'oozie'} INFO 2018-10-05 09:09:46,467 Controller.py:304 - Heartbeat (response id = 132) with server is running... INFO 2018-10-05 09:09:46,468 Controller.py:311 - Building heartbeat message INFO 2018-10-05 09:09:46,472 Heartbeat.py:87 - Adding host info/state to heartbeat message. INFO 2018-10-05 09:09:46,626 logger.py:75 - Testing the JVM's JCE policy to see it if supports an unlimited key length. INFO 2018-10-05 09:09:46,626 logger.py:75 - Testing the JVM's JCE policy to see it if supports an unlimited key length. INFO 2018-10-05 09:09:46,836 Hardware.py:188 - Some mount points were ignored: /dev/shm, /run, /sys/fs/cgroup, /run/user/42, /run/user/0 INFO 2018-10-05 09:09:46,837 Controller.py:320 - Sending Heartbeat (id = 132) INFO 2018-10-05 09:09:46,839 Controller.py:333 - Heartbeat response received (id = 133) INFO 2018-10-05 09:09:46,839 Controller.py:342 - Heartbeat interval is 1 seconds INFO 2018-10-05 09:09:46,839 Controller.py:380 - Updating configurations from heartbeat INFO 2018-10-05 09:09:46,839 Controller.py:389 - Adding cancel/execution commands INFO 2018-10-05 09:09:46,839 Controller.py:406 - Adding recovery commands INFO 2018-10-05 09:09:46,839 Controller.py:475 - Waiting 0.9 for next heartbeat INFO 2018-10-05 09:09:47,740 Controller.py:482 - Wait for next heartbeat over ERROR 2018-10-05 09:09:48,560 script_alert.py:123 - [Alert][oozie_server_status] Failed with result CRITICAL: ["Execution of 'source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status' returned 255. Connection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 1 sec. Retry count = 1\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 2 sec. Retry count = 2\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 4 sec. Retry count = 3\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 8 sec. Retry count = 4\nError: IO_ERROR : java.io.IOException: Error while connecting Oozie server. No of retries = 4. Exception = Connection refused"] ERROR 2018-10-05 09:09:48,560 script_alert.py:123 - [Alert][oozie_server_status] Failed with result CRITICAL: ["Execution of 'source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status' returned 255. Connection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 1 sec. Retry count = 1\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 2 sec. Retry count = 2\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 4 sec. Retry count = 3\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 8 sec. Retry count = 4\nError: IO_ERROR : java.io.IOException: Error while connecting Oozie server. No of retries = 4. Exception = Connection refused"] ERROR 2018-10-05 09:10:32,469 script_alert.py:123 - [Alert][hive_webhcat_server_status] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:50111/templeton/v1/status?user.name=ambari-qa + \nTraceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_webhcat_server.py", line 190, in execute\n url_response = urllib2.urlopen(query_url, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n'] ERROR 2018-10-05 09:10:32,469 script_alert.py:123 - [Alert][hive_webhcat_server_status] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:50111/templeton/v1/status?user.name=ambari-qa + \nTraceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_webhcat_server.py", line 190, in execute\n url_response = urllib2.urlopen(query_url, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n'] ERROR 2018-10-05 09:10:32,487 script_alert.py:123 - [Alert][yarn_nodemanager_health] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:8042/ws/v1/node/info (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 171, in execute\n url_response = urllib2.urlopen(query, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n)'] ERROR 2018-10-05 09:10:32,487 script_alert.py:123 - [Alert][yarn_nodemanager_health] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:8042/ws/v1/node/info (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 171, in execute\n url_response = urllib2.urlopen(query, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n)'] WARNING 2018-10-05 09:10:32,504 base_alert.py:138 - [Alert][zeppelin_server_status] Unable to execute alert. 'NoneType' object has no attribute '__getitem__' WARNING 2018-10-05 09:10:32,514 base_alert.py:138 - [Alert][datanode_health_summary] Unable to execute alert. [Alert][datanode_health_summary] Unable to extract JSON from JMX response WARNING 2018-10-05 09:10:32,537 base_alert.py:138 - [Alert][namenode_directory_status] Unable to execute alert. [Alert][namenode_directory_status] Unable to extract JSON from JMX response INFO 2018-10-05 09:10:32,579 logger.py:75 - Execute['! beeline -u 'jdbc:hive2://master.hadoop.com:10000/;transportMode=binary' -e '' 2>&1| awk '{print}'|grep -i -e 'Connection refused' -e 'Invalid URL''] {'path': ['/bin/', '/usr/bin/', '/usr/lib/hive/bin/', '/usr/sbin/'], 'timeout_kill_strategy': 2, 'timeout': 60, 'user': 'ambari-qa'} INFO 2018-10-05 09:10:32,579 logger.py:75 - Execute['! beeline -u 'jdbc:hive2://master.hadoop.com:10000/;transportMode=binary' -e '' 2>&1| awk '{print}'|grep -i -e 'Connection refused' -e 'Invalid URL''] {'path': ['/bin/', '/usr/bin/', '/usr/lib/hive/bin/', '/usr/sbin/'], 'timeout_kill_strategy': 2, 'timeout': 60, 'user': 'ambari-qa'} INFO 2018-10-05 09:10:32,590 logger.py:75 - Pid file /var/run/ambari-metrics-monitor/ambari-metrics-monitor.pid is empty or does not exist INFO 2018-10-05 09:10:32,590 logger.py:75 - Pid file /var/run/ambari-metrics-monitor/ambari-metrics-monitor.pid is empty or does not exist ERROR 2018-10-05 09:10:32,593 script_alert.py:123 - [Alert][ams_metrics_monitor_process] Failed with result CRITICAL: ['Ambari Monitor is NOT running on master.hadoop.com'] ERROR 2018-10-05 09:10:32,593 script_alert.py:123 - [Alert][ams_metrics_monitor_process] Failed with result CRITICAL: ['Ambari Monitor is NOT running on master.hadoop.com'] INFO 2018-10-05 09:10:32,595 logger.py:75 - Execute['export HIVE_CONF_DIR='/usr/hdp/current/hive-metastore/conf' ; hive --hiveconf hive.metastore.uris=thrift://master.hadoop.com:9083 --hiveconf hive.metastore.client.connect.retry.delay=1 --hiveconf hive.metastore.failure.retries=1 --hiveconf hive.metastore.connect.retries=1 --hiveconf hive.metastore.client.socket.timeout=14 --hiveconf hive.execution.engine=mr -e 'show databases;''] {'path': ['/bin/', '/usr/bin/', '/usr/sbin/', u'/usr/hdp/current/hive-metastore/bin'], 'timeout_kill_strategy': 2, 'timeout': 60, 'user': 'ambari-qa'} INFO 2018-10-05 09:10:32,595 logger.py:75 - Execute['export HIVE_CONF_DIR='/usr/hdp/current/hive-metastore/conf' ; hive --hiveconf hive.metastore.uris=thrift://master.hadoop.com:9083 --hiveconf hive.metastore.client.connect.retry.delay=1 --hiveconf hive.metastore.failure.retries=1 --hiveconf hive.metastore.connect.retries=1 --hiveconf hive.metastore.client.socket.timeout=14 --hiveconf hive.execution.engine=mr -e 'show databases;''] {'path': ['/bin/', '/usr/bin/', '/usr/sbin/', u'/usr/hdp/current/hive-metastore/bin'], 'timeout_kill_strategy': 2, 'timeout': 60, 'user': 'ambari-qa'} INFO 2018-10-05 09:10:32,617 logger.py:75 - Execute['source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status'] {'environment': None, 'user': 'oozie'} INFO 2018-10-05 09:10:32,617 logger.py:75 - Execute['source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status'] {'environment': None, 'user': 'oozie'} ERROR 2018-10-05 09:10:36,583 script_alert.py:123 - [Alert][hive_server_process] Failed with result CRITICAL: ['Connection failed on host master.hadoop.com:10000 (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_hive_thrift_port.py", line 212, in execute\n ldap_password=ldap_password)\n File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/hive_check.py", line 81, in check_thrift_port_sasl\n timeout_kill_strategy=TerminateStrategy.KILL_PROCESS_TREE,\n File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 166, in __init__\n self.env.run()\n File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run\n self.run_action(resource, action)\n File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action\n provider_action()\n File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 262, in action_run\n tries=self.resource.tries, try_sleep=self.resource.try_sleep)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner\n result = function(command, **kwargs)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call\n tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper\n result = _call(command, **kwargs_copy)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call\n raise ExecutionFailed(err_msg, code, out, err)\nExecutionFailed: Execution of \'! beeline -u \'jdbc:hive2://master.hadoop.com:10000/;transportMode=binary\' -e \'\' 2>&1| awk \'{print}\'|grep -i -e \'Connection refused\' -e \'Invalid URL\'\' returned 1. Error: Could not open client transport with JDBC Uri: jdbc:hive2://master.hadoop.com:10000/;transportMode=binary: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0)\nError: Could not open client transport with JDBC Uri: jdbc:hive2://master.hadoop.com:10000/;transportMode=binary: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0)\n)'] ERROR 2018-10-05 09:10:36,583 script_alert.py:123 - [Alert][hive_server_process] Failed with result CRITICAL: ['Connection failed on host master.hadoop.com:10000 (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_hive_thrift_port.py", line 212, in execute\n ldap_password=ldap_password)\n File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/hive_check.py", line 81, in check_thrift_port_sasl\n timeout_kill_strategy=TerminateStrategy.KILL_PROCESS_TREE,\n File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 166, in __init__\n self.env.run()\n File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run\n self.run_action(resource, action)\n File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action\n provider_action()\n File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 262, in action_run\n tries=self.resource.tries, try_sleep=self.resource.try_sleep)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner\n result = function(command, **kwargs)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call\n tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper\n result = _call(command, **kwargs_copy)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call\n raise ExecutionFailed(err_msg, code, out, err)\nExecutionFailed: Execution of \'! beeline -u \'jdbc:hive2://master.hadoop.com:10000/;transportMode=binary\' -e \'\' 2>&1| awk \'{print}\'|grep -i -e \'Connection refused\' -e \'Invalid URL\'\' returned 1. Error: Could not open client transport with JDBC Uri: jdbc:hive2://master.hadoop.com:10000/;transportMode=binary: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0)\nError: Could not open client transport with JDBC Uri: jdbc:hive2://master.hadoop.com:10000/;transportMode=binary: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0)\n)'] ERROR 2018-10-05 09:10:39,683 script_alert.py:123 - [Alert][hive_metastore_process] Failed with result CRITICAL: ['Metastore on master.hadoop.com failed (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_hive_metastore.py", line 203, in execute\n timeout_kill_strategy=TerminateStrategy.KILL_PROCESS_TREE,\n File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 166, in __init__\n self.env.run()\n File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run\n self.run_action(resource, action)\n File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action\n provider_action()\n File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 262, in action_run\n tries=self.resource.tries, try_sleep=self.resource.try_sleep)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner\n result = function(command, **kwargs)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call\n tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper\n result = _call(command, **kwargs_copy)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call\n raise ExecutionFailed(err_msg, code, out, err)\nExecutionFailed: Execution of \'export HIVE_CONF_DIR=\'/usr/hdp/current/hive-metastore/conf\' ; hive --hiveconf hive.metastore.uris=thrift://master.hadoop.com:9083 --hiveconf hive.metastore.client.connect.retry.delay=1 --hiveconf hive.metastore.failure.retries=1 --hiveconf hive.metastore.connect.retries=1 --hiveconf hive.metastore.client.socket.timeout=14 --hiveconf hive.execution.engine=mr -e \'show databases;\'\' returned 1. log4j:WARN No such property [maxFileSize] in org.apache.log4j.DailyRollingFileAppender.\n\nLogging initialized using configuration in file:/etc/hive/2.6.5.0-292/0/hive-log4j.properties\nException in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient\n\tat org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:569)\n\tat org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)\n\tat org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.lang.reflect.Method.invoke(Method.java:498)\n\tat org.apache.hadoop.util.RunJar.run(RunJar.java:233)\n\tat org.apache.hadoop.util.RunJar.main(RunJar.java:148)\nCaused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient\n\tat org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1566)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:92)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:138)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:110)\n\tat org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3556)\n\tat org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3588)\n\tat org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:550)\n\t... 8 more\nCaused by: java.lang.reflect.InvocationTargetException\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)\n\tat sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)\n\tat java.lang.reflect.Constructor.newInstance(Constructor.java:423)\n\tat org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1564)\n\t... 14 more\nCaused by: MetaException(message:Could not connect to meta store using any of the URIs provided. Most recent failure: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused (Connection refused)\n\tat org.apache.thrift.transport.TSocket.open(TSocket.java:226)\n\tat org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:487)\n\tat org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:282)\n\tat org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:76)\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)\n\tat sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)\n\tat java.lang.reflect.Constructor.newInstance(Constructor.java:423)\n\tat org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1564)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:92)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:138)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:110)\n\tat org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3556)\n\tat org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3588)\n\tat org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:550)\n\tat org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)\n\tat org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.lang.reflect.Method.invoke(Method.java:498)\n\tat org.apache.hadoop.util.RunJar.run(RunJar.java:233)\n\tat org.apache.hadoop.util.RunJar.main(RunJar.java:148)\nCaused by: java.net.ConnectException: Connection refused (Connection refused)\n\tat java.net.PlainSocketImpl.socketConnect(Native Method)\n\tat java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)\n\tat java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)\n\tat java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)\n\tat java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)\n\tat java.net.Socket.connect(Socket.java:589)\n\tat org.apache.thrift.transport.TSocket.open(TSocket.java:221)\n\t... 22 more\n)\n\tat org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:534)\n\tat org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:282)\n\tat org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:76)\n\t... 19 more\n)'] ERROR 2018-10-05 09:10:39,683 script_alert.py:123 - [Alert][hive_metastore_process] Failed with result CRITICAL: ['Metastore on master.hadoop.com failed (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_hive_metastore.py", line 203, in execute\n timeout_kill_strategy=TerminateStrategy.KILL_PROCESS_TREE,\n File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 166, in __init__\n self.env.run()\n File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run\n self.run_action(resource, action)\n File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action\n provider_action()\n File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 262, in action_run\n tries=self.resource.tries, try_sleep=self.resource.try_sleep)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner\n result = function(command, **kwargs)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call\n tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper\n result = _call(command, **kwargs_copy)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call\n raise ExecutionFailed(err_msg, code, out, err)\nExecutionFailed: Execution of \'export HIVE_CONF_DIR=\'/usr/hdp/current/hive-metastore/conf\' ; hive --hiveconf hive.metastore.uris=thrift://master.hadoop.com:9083 --hiveconf hive.metastore.client.connect.retry.delay=1 --hiveconf hive.metastore.failure.retries=1 --hiveconf hive.metastore.connect.retries=1 --hiveconf hive.metastore.client.socket.timeout=14 --hiveconf hive.execution.engine=mr -e \'show databases;\'\' returned 1. log4j:WARN No such property [maxFileSize] in org.apache.log4j.DailyRollingFileAppender.\n\nLogging initialized using configuration in file:/etc/hive/2.6.5.0-292/0/hive-log4j.properties\nException in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient\n\tat org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:569)\n\tat org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)\n\tat org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.lang.reflect.Method.invoke(Method.java:498)\n\tat org.apache.hadoop.util.RunJar.run(RunJar.java:233)\n\tat org.apache.hadoop.util.RunJar.main(RunJar.java:148)\nCaused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient\n\tat org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1566)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:92)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:138)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:110)\n\tat org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3556)\n\tat org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3588)\n\tat org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:550)\n\t... 8 more\nCaused by: java.lang.reflect.InvocationTargetException\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)\n\tat sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)\n\tat java.lang.reflect.Constructor.newInstance(Constructor.java:423)\n\tat org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1564)\n\t... 14 more\nCaused by: MetaException(message:Could not connect to meta store using any of the URIs provided. Most recent failure: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused (Connection refused)\n\tat org.apache.thrift.transport.TSocket.open(TSocket.java:226)\n\tat org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:487)\n\tat org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:282)\n\tat org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:76)\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)\n\tat sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)\n\tat java.lang.reflect.Constructor.newInstance(Constructor.java:423)\n\tat org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1564)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:92)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:138)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:110)\n\tat org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3556)\n\tat org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3588)\n\tat org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:550)\n\tat org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)\n\tat org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.lang.reflect.Method.invoke(Method.java:498)\n\tat org.apache.hadoop.util.RunJar.run(RunJar.java:233)\n\tat org.apache.hadoop.util.RunJar.main(RunJar.java:148)\nCaused by: java.net.ConnectException: Connection refused (Connection refused)\n\tat java.net.PlainSocketImpl.socketConnect(Native Method)\n\tat java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)\n\tat java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)\n\tat java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)\n\tat java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)\n\tat java.net.Socket.connect(Socket.java:589)\n\tat org.apache.thrift.transport.TSocket.open(TSocket.java:221)\n\t... 22 more\n)\n\tat org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:534)\n\tat org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:282)\n\tat org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:76)\n\t... 19 more\n)'] INFO 2018-10-05 09:10:46,871 Controller.py:304 - Heartbeat (response id = 198) with server is running... INFO 2018-10-05 09:10:46,871 Controller.py:311 - Building heartbeat message INFO 2018-10-05 09:10:46,876 Heartbeat.py:87 - Adding host info/state to heartbeat message. INFO 2018-10-05 09:10:47,000 logger.py:75 - Testing the JVM's JCE policy to see it if supports an unlimited key length. INFO 2018-10-05 09:10:47,000 logger.py:75 - Testing the JVM's JCE policy to see it if supports an unlimited key length. INFO 2018-10-05 09:10:47,189 Hardware.py:188 - Some mount points were ignored: /dev/shm, /run, /sys/fs/cgroup, /run/user/42, /run/user/0 INFO 2018-10-05 09:10:47,191 Controller.py:320 - Sending Heartbeat (id = 198) INFO 2018-10-05 09:10:47,193 Controller.py:333 - Heartbeat response received (id = 199) INFO 2018-10-05 09:10:47,193 Controller.py:342 - Heartbeat interval is 1 seconds INFO 2018-10-05 09:10:47,193 Controller.py:380 - Updating configurations from heartbeat INFO 2018-10-05 09:10:47,193 Controller.py:389 - Adding cancel/execution commands INFO 2018-10-05 09:10:47,193 Controller.py:475 - Waiting 0.9 for next heartbeat INFO 2018-10-05 09:10:48,094 Controller.py:482 - Wait for next heartbeat over ERROR 2018-10-05 09:10:48,527 script_alert.py:123 - [Alert][oozie_server_status] Failed with result CRITICAL: ["Execution of 'source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status' returned 255. Connection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 1 sec. Retry count = 1\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 2 sec. Retry count = 2\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 4 sec. Retry count = 3\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 8 sec. Retry count = 4\nError: IO_ERROR : java.io.IOException: Error while connecting Oozie server. No of retries = 4. Exception = Connection refused"] ERROR 2018-10-05 09:10:48,527 script_alert.py:123 - [Alert][oozie_server_status] Failed with result CRITICAL: ["Execution of 'source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status' returned 255. Connection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 1 sec. Retry count = 1\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 2 sec. Retry count = 2\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 4 sec. Retry count = 3\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 8 sec. Retry count = 4\nError: IO_ERROR : java.io.IOException: Error while connecting Oozie server. No of retries = 4. Exception = Connection refused"] ERROR 2018-10-05 09:11:32,484 script_alert.py:123 - [Alert][hive_webhcat_server_status] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:50111/templeton/v1/status?user.name=ambari-qa + \nTraceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_webhcat_server.py", line 190, in execute\n url_response = urllib2.urlopen(query_url, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n'] ERROR 2018-10-05 09:11:32,484 script_alert.py:123 - [Alert][hive_webhcat_server_status] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:50111/templeton/v1/status?user.name=ambari-qa + \nTraceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_webhcat_server.py", line 190, in execute\n url_response = urllib2.urlopen(query_url, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n'] ERROR 2018-10-05 09:11:32,498 script_alert.py:123 - [Alert][yarn_nodemanager_health] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:8042/ws/v1/node/info (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 171, in execute\n url_response = urllib2.urlopen(query, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n)'] ERROR 2018-10-05 09:11:32,498 script_alert.py:123 - [Alert][yarn_nodemanager_health] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:8042/ws/v1/node/info (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 171, in execute\n url_response = urllib2.urlopen(query, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n)'] WARNING 2018-10-05 09:11:32,504 base_alert.py:138 - [Alert][zeppelin_server_status] Unable to execute alert. 'NoneType' object has no attribute '__getitem__' WARNING 2018-10-05 09:11:32,509 base_alert.py:138 - [Alert][namenode_hdfs_pending_deletion_blocks] Unable to execute alert. [Alert][namenode_hdfs_pending_deletion_blocks] Unable to extract JSON from JMX response ERROR 2018-10-05 09:11:32,519 script_alert.py:123 - [Alert][datanode_unmounted_data_dir] Failed with result CRITICAL: ['The following data dir(s) were not found: /hadoop/hdfs/data\n/root/hadoop/hdfs/data\n'] WARNING 2018-10-05 09:11:32,520 base_alert.py:138 - [Alert][datanode_health_summary] Unable to execute alert. [Alert][datanode_health_summary] Unable to extract JSON from JMX response ERROR 2018-10-05 09:11:32,519 script_alert.py:123 - [Alert][datanode_unmounted_data_dir] Failed with result CRITICAL: ['The following data dir(s) were not found: /hadoop/hdfs/data\n/root/hadoop/hdfs/data\n'] WARNING 2018-10-05 09:11:32,522 base_alert.py:138 - [Alert][datanode_heap_usage] Unable to execute alert. [Alert][datanode_heap_usage] Unable to extract JSON from JMX response WARNING 2018-10-05 09:11:32,528 base_alert.py:138 - [Alert][namenode_hdfs_blocks_health] Unable to execute alert. [Alert][namenode_hdfs_blocks_health] Unable to extract JSON from JMX response WARNING 2018-10-05 09:11:32,544 base_alert.py:138 - [Alert][datanode_storage] Unable to execute alert. [Alert][datanode_storage] Unable to extract JSON from JMX response WARNING 2018-10-05 09:11:32,557 base_alert.py:138 - [Alert][namenode_rpc_latency] Unable to execute alert. [Alert][namenode_rpc_latency] Unable to extract JSON from JMX response WARNING 2018-10-05 09:11:32,560 base_alert.py:138 - [Alert][namenode_directory_status] Unable to execute alert. [Alert][namenode_directory_status] Unable to extract JSON from JMX response WARNING 2018-10-05 09:11:32,568 base_alert.py:138 - [Alert][namenode_hdfs_capacity_utilization] Unable to execute alert. [Alert][namenode_hdfs_capacity_utilization] Unable to extract JSON from JMX response INFO 2018-10-05 09:11:32,617 logger.py:75 - Pid file /var/run/ambari-metrics-monitor/ambari-metrics-monitor.pid is empty or does not exist INFO 2018-10-05 09:11:32,617 logger.py:75 - Pid file /var/run/ambari-metrics-monitor/ambari-metrics-monitor.pid is empty or does not exist ERROR 2018-10-05 09:11:32,618 script_alert.py:123 - [Alert][ams_metrics_monitor_process] Failed with result CRITICAL: ['Ambari Monitor is NOT running on master.hadoop.com'] ERROR 2018-10-05 09:11:32,618 script_alert.py:123 - [Alert][ams_metrics_monitor_process] Failed with result CRITICAL: ['Ambari Monitor is NOT running on master.hadoop.com'] INFO 2018-10-05 09:11:32,620 logger.py:75 - Execute['source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status'] {'environment': None, 'user': 'oozie'} INFO 2018-10-05 09:11:32,620 logger.py:75 - Execute['source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status'] {'environment': None, 'user': 'oozie'} INFO 2018-10-05 09:11:47,240 Controller.py:304 - Heartbeat (response id = 264) with server is running... INFO 2018-10-05 09:11:47,241 Controller.py:311 - Building heartbeat message INFO 2018-10-05 09:11:47,244 Heartbeat.py:87 - Adding host info/state to heartbeat message. INFO 2018-10-05 09:11:47,387 logger.py:75 - Testing the JVM's JCE policy to see it if supports an unlimited key length. INFO 2018-10-05 09:11:47,387 logger.py:75 - Testing the JVM's JCE policy to see it if supports an unlimited key length. INFO 2018-10-05 09:11:47,615 Hardware.py:188 - Some mount points were ignored: /dev/shm, /run, /sys/fs/cgroup, /run/user/42, /run/user/0 INFO 2018-10-05 09:11:47,617 Controller.py:320 - Sending Heartbeat (id = 264) INFO 2018-10-05 09:11:47,619 Controller.py:333 - Heartbeat response received (id = 265) INFO 2018-10-05 09:11:47,619 Controller.py:342 - Heartbeat interval is 1 seconds INFO 2018-10-05 09:11:47,619 Controller.py:380 - Updating configurations from heartbeat INFO 2018-10-05 09:11:47,619 Controller.py:389 - Adding cancel/execution commands INFO 2018-10-05 09:11:47,619 Controller.py:475 - Waiting 0.9 for next heartbeat INFO 2018-10-05 09:11:48,519 Controller.py:482 - Wait for next heartbeat over ERROR 2018-10-05 09:11:48,525 script_alert.py:123 - [Alert][oozie_server_status] Failed with result CRITICAL: ["Execution of 'source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status' returned 255. Connection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 1 sec. Retry count = 1\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 2 sec. Retry count = 2\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 4 sec. Retry count = 3\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 8 sec. Retry count = 4\nError: IO_ERROR : java.io.IOException: Error while connecting Oozie server. No of retries = 4. Exception = Connection refused"] ERROR 2018-10-05 09:11:48,525 script_alert.py:123 - [Alert][oozie_server_status] Failed with result CRITICAL: ["Execution of 'source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status' returned 255. Connection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 1 sec. Retry count = 1\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 2 sec. Retry count = 2\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 4 sec. Retry count = 3\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 8 sec. Retry count = 4\nError: IO_ERROR : java.io.IOException: Error while connecting Oozie server. No of retries = 4. Exception = Connection refused"] WARNING 2018-10-05 09:12:32,471 base_alert.py:138 - [Alert][hbase_master_cpu] Unable to execute alert. [Alert][hbase_master_cpu] Unable to extract JSON from JMX response ERROR 2018-10-05 09:12:32,482 script_alert.py:123 - [Alert][hive_webhcat_server_status] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:50111/templeton/v1/status?user.name=ambari-qa + \nTraceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_webhcat_server.py", line 190, in execute\n url_response = urllib2.urlopen(query_url, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n'] ERROR 2018-10-05 09:12:32,482 script_alert.py:123 - [Alert][hive_webhcat_server_status] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:50111/templeton/v1/status?user.name=ambari-qa + \nTraceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_webhcat_server.py", line 190, in execute\n url_response = urllib2.urlopen(query_url, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n'] ERROR 2018-10-05 09:12:32,488 script_alert.py:123 - [Alert][yarn_nodemanager_health] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:8042/ws/v1/node/info (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 171, in execute\n url_response = urllib2.urlopen(query, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n)'] ERROR 2018-10-05 09:12:32,488 script_alert.py:123 - [Alert][yarn_nodemanager_health] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:8042/ws/v1/node/info (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 171, in execute\n url_response = urllib2.urlopen(query, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n)'] WARNING 2018-10-05 09:12:32,494 base_alert.py:138 - [Alert][yarn_resourcemanager_cpu] Unable to execute alert. [Alert][yarn_resourcemanager_cpu] Unable to extract JSON from JMX response WARNING 2018-10-05 09:12:32,499 base_alert.py:138 - [Alert][yarn_resourcemanager_rpc_latency] Unable to execute alert. [Alert][yarn_resourcemanager_rpc_latency] Unable to extract JSON from JMX response WARNING 2018-10-05 09:12:32,512 base_alert.py:138 - [Alert][ams_metrics_collector_hbase_master_cpu] Unable to execute alert. [Alert][ams_metrics_collector_hbase_master_cpu] Unable to extract JSON from JMX response WARNING 2018-10-05 09:12:32,518 base_alert.py:138 - [Alert][zeppelin_server_status] Unable to execute alert. 'NoneType' object has no attribute '__getitem__' WARNING 2018-10-05 09:12:32,524 base_alert.py:138 - [Alert][namenode_cpu] Unable to execute alert. [Alert][namenode_cpu] Unable to extract JSON from JMX response WARNING 2018-10-05 09:12:32,525 base_alert.py:138 - [Alert][datanode_health_summary] Unable to execute alert. [Alert][datanode_health_summary] Unable to extract JSON from JMX response WARNING 2018-10-05 09:12:32,541 base_alert.py:138 - [Alert][namenode_service_rpc_processing_latency_hourly] Unable to execute alert. Couldn't define hadoop_conf_dir: argument of type 'NoneType' is not iterable WARNING 2018-10-05 09:12:32,544 base_alert.py:138 - [Alert][namenode_client_rpc_queue_latency_hourly] Unable to execute alert. Couldn't define hadoop_conf_dir: argument of type 'NoneType' is not iterable WARNING 2018-10-05 09:12:32,545 base_alert.py:138 - [Alert][namenode_client_rpc_processing_latency_hourly] Unable to execute alert. Couldn't define hadoop_conf_dir: argument of type 'NoneType' is not iterable WARNING 2018-10-05 09:12:32,550 base_alert.py:138 - [Alert][namenode_directory_status] Unable to execute alert. [Alert][namenode_directory_status] Unable to extract JSON from JMX response WARNING 2018-10-05 09:12:32,552 base_alert.py:138 - [Alert][namenode_service_rpc_queue_latency_hourly] Unable to execute alert. Couldn't define hadoop_conf_dir: argument of type 'NoneType' is not iterable WARNING 2018-10-05 09:12:32,579 base_alert.py:138 - [Alert][mapreduce_history_server_rpc_latency] Unable to execute alert. [Alert][mapreduce_history_server_rpc_latency] Unable to extract JSON from JMX response WARNING 2018-10-05 09:12:32,579 base_alert.py:138 - [Alert][mapreduce_history_server_cpu] Unable to execute alert. [Alert][mapreduce_history_server_cpu] Unable to extract JSON from JMX response WARNING 2018-10-05 09:12:32,590 logger.py:71 - Cannot find the stack name in the command. Stack tools cannot be loaded WARNING 2018-10-05 09:12:32,590 logger.py:71 - Cannot find the stack name in the command. Stack tools cannot be loaded INFO 2018-10-05 09:12:32,591 logger.py:75 - call[('ambari-python-wrap', None, 'versions')] {} INFO 2018-10-05 09:12:32,591 logger.py:75 - call[('ambari-python-wrap', None, 'versions')] {} INFO 2018-10-05 09:12:32,612 logger.py:75 - Pid file /var/run/ambari-metrics-monitor/ambari-metrics-monitor.pid is empty or does not exist INFO 2018-10-05 09:12:32,612 logger.py:75 - Pid file /var/run/ambari-metrics-monitor/ambari-metrics-monitor.pid is empty or does not exist ERROR 2018-10-05 09:12:32,613 script_alert.py:123 - [Alert][ams_metrics_monitor_process] Failed with result CRITICAL: ['Ambari Monitor is NOT running on master.hadoop.com'] INFO 2018-10-05 09:12:32,613 logger.py:75 - call returned (1, "/bin/ambari-python-wrap: can't find '__main__' module in ''") ERROR 2018-10-05 09:12:32,613 script_alert.py:123 - [Alert][ams_metrics_monitor_process] Failed with result CRITICAL: ['Ambari Monitor is NOT running on master.hadoop.com'] INFO 2018-10-05 09:12:32,613 logger.py:75 - call returned (1, "/bin/ambari-python-wrap: can't find '__main__' module in ''") ERROR 2018-10-05 09:12:32,613 script_alert.py:123 - [Alert][ambari_agent_version_select] Failed with result CRITICAL: ["hdp-select could not properly read /usr/hdp. Check this directory for unexpected contents.\n/bin/ambari-python-wrap: can't find '__main__' module in ''"] ERROR 2018-10-05 09:12:32,613 script_alert.py:123 - [Alert][ambari_agent_version_select] Failed with result CRITICAL: ["hdp-select could not properly read /usr/hdp. Check this directory for unexpected contents.\n/bin/ambari-python-wrap: can't find '__main__' module in ''"] INFO 2018-10-05 09:12:32,618 logger.py:75 - Execute['source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status'] {'environment': None, 'user': 'oozie'} INFO 2018-10-05 09:12:32,618 logger.py:75 - Execute['source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status'] {'environment': None, 'user': 'oozie'} INFO 2018-10-05 09:12:47,706 Controller.py:304 - Heartbeat (response id = 330) with server is running... INFO 2018-10-05 09:12:47,707 Controller.py:311 - Building heartbeat message INFO 2018-10-05 09:12:47,711 Heartbeat.py:87 - Adding host info/state to heartbeat message. INFO 2018-10-05 09:12:47,853 logger.py:75 - Testing the JVM's JCE policy to see it if supports an unlimited key length. INFO 2018-10-05 09:12:47,853 logger.py:75 - Testing the JVM's JCE policy to see it if supports an unlimited key length. INFO 2018-10-05 09:12:48,042 Hardware.py:188 - Some mount points were ignored: /dev/shm, /run, /sys/fs/cgroup, /run/user/42, /run/user/0 INFO 2018-10-05 09:12:48,044 Controller.py:320 - Sending Heartbeat (id = 330) INFO 2018-10-05 09:12:48,045 Controller.py:333 - Heartbeat response received (id = 331) INFO 2018-10-05 09:12:48,046 Controller.py:342 - Heartbeat interval is 1 seconds INFO 2018-10-05 09:12:48,046 Controller.py:380 - Updating configurations from heartbeat INFO 2018-10-05 09:12:48,046 Controller.py:389 - Adding cancel/execution commands INFO 2018-10-05 09:12:48,046 Controller.py:475 - Waiting 0.9 for next heartbeat ERROR 2018-10-05 09:12:48,539 script_alert.py:123 - [Alert][oozie_server_status] Failed with result CRITICAL: ["Execution of 'source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status' returned 255. Connection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 1 sec. Retry count = 1\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 2 sec. Retry count = 2\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 4 sec. Retry count = 3\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 8 sec. Retry count = 4\nError: IO_ERROR : java.io.IOException: Error while connecting Oozie server. No of retries = 4. Exception = Connection refused"] ERROR 2018-10-05 09:12:48,539 script_alert.py:123 - [Alert][oozie_server_status] Failed with result CRITICAL: ["Execution of 'source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status' returned 255. Connection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 1 sec. Retry count = 1\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 2 sec. Retry count = 2\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 4 sec. Retry count = 3\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 8 sec. Retry count = 4\nError: IO_ERROR : java.io.IOException: Error while connecting Oozie server. No of retries = 4. Exception = Connection refused"] INFO 2018-10-05 09:12:48,946 Controller.py:482 - Wait for next heartbeat over ERROR 2018-10-05 09:13:32,473 script_alert.py:123 - [Alert][hive_webhcat_server_status] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:50111/templeton/v1/status?user.name=ambari-qa + \nTraceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_webhcat_server.py", line 190, in execute\n url_response = urllib2.urlopen(query_url, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n'] ERROR 2018-10-05 09:13:32,473 script_alert.py:123 - [Alert][hive_webhcat_server_status] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:50111/templeton/v1/status?user.name=ambari-qa + \nTraceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_webhcat_server.py", line 190, in execute\n url_response = urllib2.urlopen(query_url, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n'] ERROR 2018-10-05 09:13:32,481 script_alert.py:123 - [Alert][yarn_nodemanager_health] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:8042/ws/v1/node/info (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 171, in execute\n url_response = urllib2.urlopen(query, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n)'] ERROR 2018-10-05 09:13:32,481 script_alert.py:123 - [Alert][yarn_nodemanager_health] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:8042/ws/v1/node/info (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 171, in execute\n url_response = urllib2.urlopen(query, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n)'] WARNING 2018-10-05 09:13:32,504 base_alert.py:138 - [Alert][zeppelin_server_status] Unable to execute alert. 'NoneType' object has no attribute '__getitem__' WARNING 2018-10-05 09:13:32,508 base_alert.py:138 - [Alert][namenode_hdfs_pending_deletion_blocks] Unable to execute alert. [Alert][namenode_hdfs_pending_deletion_blocks] Unable to extract JSON from JMX response WARNING 2018-10-05 09:13:32,515 base_alert.py:138 - [Alert][datanode_heap_usage] Unable to execute alert. [Alert][datanode_heap_usage] Unable to extract JSON from JMX response WARNING 2018-10-05 09:13:32,518 base_alert.py:138 - [Alert][datanode_health_summary] Unable to execute alert. [Alert][datanode_health_summary] Unable to extract JSON from JMX response ERROR 2018-10-05 09:13:32,521 script_alert.py:123 - [Alert][datanode_unmounted_data_dir] Failed with result CRITICAL: ['The following data dir(s) were not found: /hadoop/hdfs/data\n/root/hadoop/hdfs/data\n'] ERROR 2018-10-05 09:13:32,521 script_alert.py:123 - [Alert][datanode_unmounted_data_dir] Failed with result CRITICAL: ['The following data dir(s) were not found: /hadoop/hdfs/data\n/root/hadoop/hdfs/data\n'] WARNING 2018-10-05 09:13:32,535 base_alert.py:138 - [Alert][namenode_hdfs_blocks_health] Unable to execute alert. [Alert][namenode_hdfs_blocks_health] Unable to extract JSON from JMX response WARNING 2018-10-05 09:13:32,540 base_alert.py:138 - [Alert][datanode_storage] Unable to execute alert. [Alert][datanode_storage] Unable to extract JSON from JMX response WARNING 2018-10-05 09:13:32,545 base_alert.py:138 - [Alert][namenode_directory_status] Unable to execute alert. [Alert][namenode_directory_status] Unable to extract JSON from JMX response WARNING 2018-10-05 09:13:32,547 base_alert.py:138 - [Alert][namenode_hdfs_capacity_utilization] Unable to execute alert. [Alert][namenode_hdfs_capacity_utilization] Unable to extract JSON from JMX response WARNING 2018-10-05 09:13:32,550 base_alert.py:138 - [Alert][namenode_rpc_latency] Unable to execute alert. [Alert][namenode_rpc_latency] Unable to extract JSON from JMX response INFO 2018-10-05 09:13:32,579 logger.py:75 - Execute['export HIVE_CONF_DIR='/usr/hdp/current/hive-metastore/conf' ; hive --hiveconf hive.metastore.uris=thrift://master.hadoop.com:9083 --hiveconf hive.metastore.client.connect.retry.delay=1 --hiveconf hive.metastore.failure.retries=1 --hiveconf hive.metastore.connect.retries=1 --hiveconf hive.metastore.client.socket.timeout=14 --hiveconf hive.execution.engine=mr -e 'show databases;''] {'path': ['/bin/', '/usr/bin/', '/usr/sbin/', u'/usr/hdp/current/hive-metastore/bin'], 'timeout_kill_strategy': 2, 'timeout': 60, 'user': 'ambari-qa'} INFO 2018-10-05 09:13:32,579 logger.py:75 - Execute['export HIVE_CONF_DIR='/usr/hdp/current/hive-metastore/conf' ; hive --hiveconf hive.metastore.uris=thrift://master.hadoop.com:9083 --hiveconf hive.metastore.client.connect.retry.delay=1 --hiveconf hive.metastore.failure.retries=1 --hiveconf hive.metastore.connect.retries=1 --hiveconf hive.metastore.client.socket.timeout=14 --hiveconf hive.execution.engine=mr -e 'show databases;''] {'path': ['/bin/', '/usr/bin/', '/usr/sbin/', u'/usr/hdp/current/hive-metastore/bin'], 'timeout_kill_strategy': 2, 'timeout': 60, 'user': 'ambari-qa'} INFO 2018-10-05 09:13:32,601 logger.py:75 - Pid file /var/run/ambari-metrics-monitor/ambari-metrics-monitor.pid is empty or does not exist INFO 2018-10-05 09:13:32,601 logger.py:75 - Pid file /var/run/ambari-metrics-monitor/ambari-metrics-monitor.pid is empty or does not exist ERROR 2018-10-05 09:13:32,601 script_alert.py:123 - [Alert][ams_metrics_monitor_process] Failed with result CRITICAL: ['Ambari Monitor is NOT running on master.hadoop.com'] ERROR 2018-10-05 09:13:32,601 script_alert.py:123 - [Alert][ams_metrics_monitor_process] Failed with result CRITICAL: ['Ambari Monitor is NOT running on master.hadoop.com'] INFO 2018-10-05 09:13:32,603 logger.py:75 - Execute['! beeline -u 'jdbc:hive2://master.hadoop.com:10000/;transportMode=binary' -e '' 2>&1| awk '{print}'|grep -i -e 'Connection refused' -e 'Invalid URL''] {'path': ['/bin/', '/usr/bin/', '/usr/lib/hive/bin/', '/usr/sbin/'], 'timeout_kill_strategy': 2, 'timeout': 60, 'user': 'ambari-qa'} INFO 2018-10-05 09:13:32,603 logger.py:75 - Execute['! beeline -u 'jdbc:hive2://master.hadoop.com:10000/;transportMode=binary' -e '' 2>&1| awk '{print}'|grep -i -e 'Connection refused' -e 'Invalid URL''] {'path': ['/bin/', '/usr/bin/', '/usr/lib/hive/bin/', '/usr/sbin/'], 'timeout_kill_strategy': 2, 'timeout': 60, 'user': 'ambari-qa'} INFO 2018-10-05 09:13:32,637 logger.py:75 - Execute['source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status'] {'environment': None, 'user': 'oozie'} INFO 2018-10-05 09:13:32,637 logger.py:75 - Execute['source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status'] {'environment': None, 'user': 'oozie'} ERROR 2018-10-05 09:13:36,707 script_alert.py:123 - [Alert][hive_server_process] Failed with result CRITICAL: ['Connection failed on host master.hadoop.com:10000 (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_hive_thrift_port.py", line 212, in execute\n ldap_password=ldap_password)\n File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/hive_check.py", line 81, in check_thrift_port_sasl\n timeout_kill_strategy=TerminateStrategy.KILL_PROCESS_TREE,\n File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 166, in __init__\n self.env.run()\n File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run\n self.run_action(resource, action)\n File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action\n provider_action()\n File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 262, in action_run\n tries=self.resource.tries, try_sleep=self.resource.try_sleep)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner\n result = function(command, **kwargs)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call\n tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper\n result = _call(command, **kwargs_copy)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call\n raise ExecutionFailed(err_msg, code, out, err)\nExecutionFailed: Execution of \'! beeline -u \'jdbc:hive2://master.hadoop.com:10000/;transportMode=binary\' -e \'\' 2>&1| awk \'{print}\'|grep -i -e \'Connection refused\' -e \'Invalid URL\'\' returned 1. Error: Could not open client transport with JDBC Uri: jdbc:hive2://master.hadoop.com:10000/;transportMode=binary: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0)\nError: Could not open client transport with JDBC Uri: jdbc:hive2://master.hadoop.com:10000/;transportMode=binary: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0)\n)'] ERROR 2018-10-05 09:13:36,707 script_alert.py:123 - [Alert][hive_server_process] Failed with result CRITICAL: ['Connection failed on host master.hadoop.com:10000 (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_hive_thrift_port.py", line 212, in execute\n ldap_password=ldap_password)\n File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/hive_check.py", line 81, in check_thrift_port_sasl\n timeout_kill_strategy=TerminateStrategy.KILL_PROCESS_TREE,\n File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 166, in __init__\n self.env.run()\n File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run\n self.run_action(resource, action)\n File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action\n provider_action()\n File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 262, in action_run\n tries=self.resource.tries, try_sleep=self.resource.try_sleep)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner\n result = function(command, **kwargs)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call\n tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper\n result = _call(command, **kwargs_copy)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call\n raise ExecutionFailed(err_msg, code, out, err)\nExecutionFailed: Execution of \'! beeline -u \'jdbc:hive2://master.hadoop.com:10000/;transportMode=binary\' -e \'\' 2>&1| awk \'{print}\'|grep -i -e \'Connection refused\' -e \'Invalid URL\'\' returned 1. Error: Could not open client transport with JDBC Uri: jdbc:hive2://master.hadoop.com:10000/;transportMode=binary: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0)\nError: Could not open client transport with JDBC Uri: jdbc:hive2://master.hadoop.com:10000/;transportMode=binary: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0)\n)'] ERROR 2018-10-05 09:13:39,591 script_alert.py:123 - [Alert][hive_metastore_process] Failed with result CRITICAL: ['Metastore on master.hadoop.com failed (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_hive_metastore.py", line 203, in execute\n timeout_kill_strategy=TerminateStrategy.KILL_PROCESS_TREE,\n File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 166, in __init__\n self.env.run()\n File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run\n self.run_action(resource, action)\n File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action\n provider_action()\n File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 262, in action_run\n tries=self.resource.tries, try_sleep=self.resource.try_sleep)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner\n result = function(command, **kwargs)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call\n tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper\n result = _call(command, **kwargs_copy)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call\n raise ExecutionFailed(err_msg, code, out, err)\nExecutionFailed: Execution of \'export HIVE_CONF_DIR=\'/usr/hdp/current/hive-metastore/conf\' ; hive --hiveconf hive.metastore.uris=thrift://master.hadoop.com:9083 --hiveconf hive.metastore.client.connect.retry.delay=1 --hiveconf hive.metastore.failure.retries=1 --hiveconf hive.metastore.connect.retries=1 --hiveconf hive.metastore.client.socket.timeout=14 --hiveconf hive.execution.engine=mr -e \'show databases;\'\' returned 1. log4j:WARN No such property [maxFileSize] in org.apache.log4j.DailyRollingFileAppender.\n\nLogging initialized using configuration in file:/etc/hive/2.6.5.0-292/0/hive-log4j.properties\nException in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient\n\tat org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:569)\n\tat org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)\n\tat org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.lang.reflect.Method.invoke(Method.java:498)\n\tat org.apache.hadoop.util.RunJar.run(RunJar.java:233)\n\tat org.apache.hadoop.util.RunJar.main(RunJar.java:148)\nCaused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient\n\tat org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1566)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:92)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:138)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:110)\n\tat org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3556)\n\tat org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3588)\n\tat org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:550)\n\t... 8 more\nCaused by: java.lang.reflect.InvocationTargetException\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)\n\tat sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)\n\tat java.lang.reflect.Constructor.newInstance(Constructor.java:423)\n\tat org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1564)\n\t... 14 more\nCaused by: MetaException(message:Could not connect to meta store using any of the URIs provided. Most recent failure: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused (Connection refused)\n\tat org.apache.thrift.transport.TSocket.open(TSocket.java:226)\n\tat org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:487)\n\tat org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:282)\n\tat org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:76)\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)\n\tat sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)\n\tat java.lang.reflect.Constructor.newInstance(Constructor.java:423)\n\tat org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1564)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:92)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:138)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:110)\n\tat org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3556)\n\tat org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3588)\n\tat org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:550)\n\tat org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)\n\tat org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.lang.reflect.Method.invoke(Method.java:498)\n\tat org.apache.hadoop.util.RunJar.run(RunJar.java:233)\n\tat org.apache.hadoop.util.RunJar.main(RunJar.java:148)\nCaused by: java.net.ConnectException: Connection refused (Connection refused)\n\tat java.net.PlainSocketImpl.socketConnect(Native Method)\n\tat java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)\n\tat java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)\n\tat java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)\n\tat java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)\n\tat java.net.Socket.connect(Socket.java:589)\n\tat org.apache.thrift.transport.TSocket.open(TSocket.java:221)\n\t... 22 more\n)\n\tat org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:534)\n\tat org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:282)\n\tat org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:76)\n\t... 19 more\n)'] ERROR 2018-10-05 09:13:39,591 script_alert.py:123 - [Alert][hive_metastore_process] Failed with result CRITICAL: ['Metastore on master.hadoop.com failed (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_hive_metastore.py", line 203, in execute\n timeout_kill_strategy=TerminateStrategy.KILL_PROCESS_TREE,\n File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 166, in __init__\n self.env.run()\n File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run\n self.run_action(resource, action)\n File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action\n provider_action()\n File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 262, in action_run\n tries=self.resource.tries, try_sleep=self.resource.try_sleep)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner\n result = function(command, **kwargs)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call\n tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper\n result = _call(command, **kwargs_copy)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call\n raise ExecutionFailed(err_msg, code, out, err)\nExecutionFailed: Execution of \'export HIVE_CONF_DIR=\'/usr/hdp/current/hive-metastore/conf\' ; hive --hiveconf hive.metastore.uris=thrift://master.hadoop.com:9083 --hiveconf hive.metastore.client.connect.retry.delay=1 --hiveconf hive.metastore.failure.retries=1 --hiveconf hive.metastore.connect.retries=1 --hiveconf hive.metastore.client.socket.timeout=14 --hiveconf hive.execution.engine=mr -e \'show databases;\'\' returned 1. log4j:WARN No such property [maxFileSize] in org.apache.log4j.DailyRollingFileAppender.\n\nLogging initialized using configuration in file:/etc/hive/2.6.5.0-292/0/hive-log4j.properties\nException in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient\n\tat org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:569)\n\tat org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)\n\tat org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.lang.reflect.Method.invoke(Method.java:498)\n\tat org.apache.hadoop.util.RunJar.run(RunJar.java:233)\n\tat org.apache.hadoop.util.RunJar.main(RunJar.java:148)\nCaused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient\n\tat org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1566)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:92)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:138)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:110)\n\tat org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3556)\n\tat org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3588)\n\tat org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:550)\n\t... 8 more\nCaused by: java.lang.reflect.InvocationTargetException\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)\n\tat sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)\n\tat java.lang.reflect.Constructor.newInstance(Constructor.java:423)\n\tat org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1564)\n\t... 14 more\nCaused by: MetaException(message:Could not connect to meta store using any of the URIs provided. Most recent failure: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused (Connection refused)\n\tat org.apache.thrift.transport.TSocket.open(TSocket.java:226)\n\tat org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:487)\n\tat org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:282)\n\tat org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:76)\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)\n\tat sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)\n\tat java.lang.reflect.Constructor.newInstance(Constructor.java:423)\n\tat org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1564)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:92)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:138)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:110)\n\tat org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3556)\n\tat org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3588)\n\tat org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:550)\n\tat org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)\n\tat org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.lang.reflect.Method.invoke(Method.java:498)\n\tat org.apache.hadoop.util.RunJar.run(RunJar.java:233)\n\tat org.apache.hadoop.util.RunJar.main(RunJar.java:148)\nCaused by: java.net.ConnectException: Connection refused (Connection refused)\n\tat java.net.PlainSocketImpl.socketConnect(Native Method)\n\tat java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)\n\tat java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)\n\tat java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)\n\tat java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)\n\tat java.net.Socket.connect(Socket.java:589)\n\tat org.apache.thrift.transport.TSocket.open(TSocket.java:221)\n\t... 22 more\n)\n\tat org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:534)\n\tat org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:282)\n\tat org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:76)\n\t... 19 more\n)'] INFO 2018-10-05 09:13:48,098 Controller.py:304 - Heartbeat (response id = 396) with server is running... INFO 2018-10-05 09:13:48,099 Controller.py:311 - Building heartbeat message INFO 2018-10-05 09:13:48,103 Heartbeat.py:87 - Adding host info/state to heartbeat message. INFO 2018-10-05 09:13:48,234 logger.py:75 - Testing the JVM's JCE policy to see it if supports an unlimited key length. INFO 2018-10-05 09:13:48,234 logger.py:75 - Testing the JVM's JCE policy to see it if supports an unlimited key length. INFO 2018-10-05 09:13:48,432 Hardware.py:188 - Some mount points were ignored: /dev/shm, /run, /sys/fs/cgroup, /run/user/42, /run/user/0 INFO 2018-10-05 09:13:48,434 Controller.py:320 - Sending Heartbeat (id = 396) INFO 2018-10-05 09:13:48,436 Controller.py:333 - Heartbeat response received (id = 397) INFO 2018-10-05 09:13:48,436 Controller.py:342 - Heartbeat interval is 1 seconds INFO 2018-10-05 09:13:48,436 Controller.py:380 - Updating configurations from heartbeat INFO 2018-10-05 09:13:48,437 Controller.py:389 - Adding cancel/execution commands INFO 2018-10-05 09:13:48,437 Controller.py:475 - Waiting 0.9 for next heartbeat ERROR 2018-10-05 09:13:48,555 script_alert.py:123 - [Alert][oozie_server_status] Failed with result CRITICAL: ["Execution of 'source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status' returned 255. Connection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 1 sec. Retry count = 1\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 2 sec. Retry count = 2\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 4 sec. Retry count = 3\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 8 sec. Retry count = 4\nError: IO_ERROR : java.io.IOException: Error while connecting Oozie server. No of retries = 4. Exception = Connection refused"] ERROR 2018-10-05 09:13:48,555 script_alert.py:123 - [Alert][oozie_server_status] Failed with result CRITICAL: ["Execution of 'source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status' returned 255. Connection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 1 sec. Retry count = 1\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 2 sec. Retry count = 2\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 4 sec. Retry count = 3\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 8 sec. Retry count = 4\nError: IO_ERROR : java.io.IOException: Error while connecting Oozie server. No of retries = 4. Exception = Connection refused"] INFO 2018-10-05 09:13:49,337 Controller.py:482 - Wait for next heartbeat over ERROR 2018-10-05 09:14:32,464 script_alert.py:123 - [Alert][hive_webhcat_server_status] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:50111/templeton/v1/status?user.name=ambari-qa + \nTraceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_webhcat_server.py", line 190, in execute\n url_response = urllib2.urlopen(query_url, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n'] ERROR 2018-10-05 09:14:32,464 script_alert.py:123 - [Alert][hive_webhcat_server_status] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:50111/templeton/v1/status?user.name=ambari-qa + \nTraceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_webhcat_server.py", line 190, in execute\n url_response = urllib2.urlopen(query_url, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n'] ERROR 2018-10-05 09:14:32,482 script_alert.py:123 - [Alert][yarn_nodemanager_health] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:8042/ws/v1/node/info (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 171, in execute\n url_response = urllib2.urlopen(query, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n)'] ERROR 2018-10-05 09:14:32,482 script_alert.py:123 - [Alert][yarn_nodemanager_health] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:8042/ws/v1/node/info (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 171, in execute\n url_response = urllib2.urlopen(query, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n)'] WARNING 2018-10-05 09:14:32,497 base_alert.py:138 - [Alert][zeppelin_server_status] Unable to execute alert. 'NoneType' object has no attribute '__getitem__' WARNING 2018-10-05 09:14:32,506 base_alert.py:138 - [Alert][datanode_health_summary] Unable to execute alert. [Alert][datanode_health_summary] Unable to extract JSON from JMX response WARNING 2018-10-05 09:14:32,527 base_alert.py:138 - [Alert][namenode_directory_status] Unable to execute alert. [Alert][namenode_directory_status] Unable to extract JSON from JMX response INFO 2018-10-05 09:14:32,572 logger.py:75 - Pid file /var/run/ambari-metrics-monitor/ambari-metrics-monitor.pid is empty or does not exist INFO 2018-10-05 09:14:32,572 logger.py:75 - Pid file /var/run/ambari-metrics-monitor/ambari-metrics-monitor.pid is empty or does not exist ERROR 2018-10-05 09:14:32,573 script_alert.py:123 - [Alert][ams_metrics_monitor_process] Failed with result CRITICAL: ['Ambari Monitor is NOT running on master.hadoop.com'] ERROR 2018-10-05 09:14:32,573 script_alert.py:123 - [Alert][ams_metrics_monitor_process] Failed with result CRITICAL: ['Ambari Monitor is NOT running on master.hadoop.com'] INFO 2018-10-05 09:14:32,591 logger.py:75 - Execute['source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status'] {'environment': None, 'user': 'oozie'} INFO 2018-10-05 09:14:32,591 logger.py:75 - Execute['source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status'] {'environment': None, 'user': 'oozie'} INFO 2018-10-05 09:14:48,506 Controller.py:304 - Heartbeat (response id = 462) with server is running... INFO 2018-10-05 09:14:48,507 Controller.py:311 - Building heartbeat message INFO 2018-10-05 09:14:48,511 Heartbeat.py:87 - Adding host info/state to heartbeat message. ERROR 2018-10-05 09:14:48,536 script_alert.py:123 - [Alert][oozie_server_status] Failed with result CRITICAL: ["Execution of 'source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status' returned 255. Connection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 1 sec. Retry count = 1\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 2 sec. Retry count = 2\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 4 sec. Retry count = 3\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 8 sec. Retry count = 4\nError: IO_ERROR : java.io.IOException: Error while connecting Oozie server. No of retries = 4. Exception = Connection refused"] ERROR 2018-10-05 09:14:48,536 script_alert.py:123 - [Alert][oozie_server_status] Failed with result CRITICAL: ["Execution of 'source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status' returned 255. Connection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 1 sec. Retry count = 1\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 2 sec. Retry count = 2\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 4 sec. Retry count = 3\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 8 sec. Retry count = 4\nError: IO_ERROR : java.io.IOException: Error while connecting Oozie server. No of retries = 4. Exception = Connection refused"] INFO 2018-10-05 09:14:48,611 logger.py:75 - Testing the JVM's JCE policy to see it if supports an unlimited key length. INFO 2018-10-05 09:14:48,611 logger.py:75 - Testing the JVM's JCE policy to see it if supports an unlimited key length. INFO 2018-10-05 09:14:48,791 Hardware.py:188 - Some mount points were ignored: /dev/shm, /run, /sys/fs/cgroup, /run/user/42, /run/user/0 INFO 2018-10-05 09:14:48,792 Controller.py:320 - Sending Heartbeat (id = 462) INFO 2018-10-05 09:14:48,794 Controller.py:333 - Heartbeat response received (id = 463) INFO 2018-10-05 09:14:48,794 Controller.py:342 - Heartbeat interval is 1 seconds INFO 2018-10-05 09:14:48,794 Controller.py:380 - Updating configurations from heartbeat INFO 2018-10-05 09:14:48,794 Controller.py:389 - Adding cancel/execution commands INFO 2018-10-05 09:14:48,794 Controller.py:475 - Waiting 0.9 for next heartbeat INFO 2018-10-05 09:14:49,695 Controller.py:482 - Wait for next heartbeat over ERROR 2018-10-05 09:15:32,481 script_alert.py:123 - [Alert][hive_webhcat_server_status] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:50111/templeton/v1/status?user.name=ambari-qa + \nTraceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_webhcat_server.py", line 190, in execute\n url_response = urllib2.urlopen(query_url, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n'] ERROR 2018-10-05 09:15:32,481 script_alert.py:123 - [Alert][hive_webhcat_server_status] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:50111/templeton/v1/status?user.name=ambari-qa + \nTraceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_webhcat_server.py", line 190, in execute\n url_response = urllib2.urlopen(query_url, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n'] ERROR 2018-10-05 09:15:32,494 script_alert.py:123 - [Alert][yarn_nodemanager_health] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:8042/ws/v1/node/info (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 171, in execute\n url_response = urllib2.urlopen(query, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n)'] ERROR 2018-10-05 09:15:32,494 script_alert.py:123 - [Alert][yarn_nodemanager_health] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:8042/ws/v1/node/info (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 171, in execute\n url_response = urllib2.urlopen(query, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n)'] WARNING 2018-10-05 09:15:32,501 base_alert.py:138 - [Alert][zeppelin_server_status] Unable to execute alert. 'NoneType' object has no attribute '__getitem__' WARNING 2018-10-05 09:15:32,505 base_alert.py:138 - [Alert][namenode_hdfs_pending_deletion_blocks] Unable to execute alert. [Alert][namenode_hdfs_pending_deletion_blocks] Unable to extract JSON from JMX response WARNING 2018-10-05 09:15:32,510 base_alert.py:138 - [Alert][datanode_heap_usage] Unable to execute alert. [Alert][datanode_heap_usage] Unable to extract JSON from JMX response WARNING 2018-10-05 09:15:32,514 base_alert.py:138 - [Alert][datanode_health_summary] Unable to execute alert. [Alert][datanode_health_summary] Unable to extract JSON from JMX response ERROR 2018-10-05 09:15:32,516 script_alert.py:123 - [Alert][datanode_unmounted_data_dir] Failed with result CRITICAL: ['The following data dir(s) were not found: /hadoop/hdfs/data\n/root/hadoop/hdfs/data\n'] ERROR 2018-10-05 09:15:32,516 script_alert.py:123 - [Alert][datanode_unmounted_data_dir] Failed with result CRITICAL: ['The following data dir(s) were not found: /hadoop/hdfs/data\n/root/hadoop/hdfs/data\n'] WARNING 2018-10-05 09:15:32,520 base_alert.py:138 - [Alert][namenode_hdfs_blocks_health] Unable to execute alert. [Alert][namenode_hdfs_blocks_health] Unable to extract JSON from JMX response WARNING 2018-10-05 09:15:32,537 base_alert.py:138 - [Alert][datanode_storage] Unable to execute alert. [Alert][datanode_storage] Unable to extract JSON from JMX response WARNING 2018-10-05 09:15:32,542 base_alert.py:138 - [Alert][namenode_directory_status] Unable to execute alert. [Alert][namenode_directory_status] Unable to extract JSON from JMX response WARNING 2018-10-05 09:15:32,546 base_alert.py:138 - [Alert][namenode_hdfs_capacity_utilization] Unable to execute alert. [Alert][namenode_hdfs_capacity_utilization] Unable to extract JSON from JMX response WARNING 2018-10-05 09:15:32,546 base_alert.py:138 - [Alert][namenode_rpc_latency] Unable to execute alert. [Alert][namenode_rpc_latency] Unable to extract JSON from JMX response INFO 2018-10-05 09:15:32,585 logger.py:75 - Pid file /var/run/ambari-metrics-monitor/ambari-metrics-monitor.pid is empty or does not exist INFO 2018-10-05 09:15:32,585 logger.py:75 - Pid file /var/run/ambari-metrics-monitor/ambari-metrics-monitor.pid is empty or does not exist ERROR 2018-10-05 09:15:32,587 script_alert.py:123 - [Alert][ams_metrics_monitor_process] Failed with result CRITICAL: ['Ambari Monitor is NOT running on master.hadoop.com'] ERROR 2018-10-05 09:15:32,587 script_alert.py:123 - [Alert][ams_metrics_monitor_process] Failed with result CRITICAL: ['Ambari Monitor is NOT running on master.hadoop.com'] INFO 2018-10-05 09:15:32,598 logger.py:75 - Execute['source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status'] {'environment': None, 'user': 'oozie'} INFO 2018-10-05 09:15:32,598 logger.py:75 - Execute['source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status'] {'environment': None, 'user': 'oozie'} ERROR 2018-10-05 09:15:48,556 script_alert.py:123 - [Alert][oozie_server_status] Failed with result CRITICAL: ["Execution of 'source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status' returned 255. Connection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 1 sec. Retry count = 1\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 2 sec. Retry count = 2\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 4 sec. Retry count = 3\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 8 sec. Retry count = 4\nError: IO_ERROR : java.io.IOException: Error while connecting Oozie server. No of retries = 4. Exception = Connection refused"] ERROR 2018-10-05 09:15:48,556 script_alert.py:123 - [Alert][oozie_server_status] Failed with result CRITICAL: ["Execution of 'source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status' returned 255. Connection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 1 sec. Retry count = 1\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 2 sec. Retry count = 2\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 4 sec. Retry count = 3\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 8 sec. Retry count = 4\nError: IO_ERROR : java.io.IOException: Error while connecting Oozie server. No of retries = 4. Exception = Connection refused"] INFO 2018-10-05 09:15:48,857 Controller.py:304 - Heartbeat (response id = 528) with server is running... INFO 2018-10-05 09:15:48,858 Controller.py:311 - Building heartbeat message INFO 2018-10-05 09:15:48,862 Heartbeat.py:87 - Adding host info/state to heartbeat message. INFO 2018-10-05 09:15:48,996 logger.py:75 - Testing the JVM's JCE policy to see it if supports an unlimited key length. INFO 2018-10-05 09:15:48,996 logger.py:75 - Testing the JVM's JCE policy to see it if supports an unlimited key length. INFO 2018-10-05 09:15:49,235 Hardware.py:188 - Some mount points were ignored: /dev/shm, /run, /sys/fs/cgroup, /run/user/42, /run/user/0 INFO 2018-10-05 09:15:49,236 Controller.py:320 - Sending Heartbeat (id = 528) INFO 2018-10-05 09:15:49,238 Controller.py:333 - Heartbeat response received (id = 529) INFO 2018-10-05 09:15:49,238 Controller.py:342 - Heartbeat interval is 1 seconds INFO 2018-10-05 09:15:49,238 Controller.py:380 - Updating configurations from heartbeat INFO 2018-10-05 09:15:49,238 Controller.py:389 - Adding cancel/execution commands INFO 2018-10-05 09:15:49,239 Controller.py:475 - Waiting 0.9 for next heartbeat INFO 2018-10-05 09:15:50,139 Controller.py:482 - Wait for next heartbeat over ERROR 2018-10-05 09:16:32,479 script_alert.py:123 - [Alert][hive_webhcat_server_status] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:50111/templeton/v1/status?user.name=ambari-qa + \nTraceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_webhcat_server.py", line 190, in execute\n url_response = urllib2.urlopen(query_url, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n'] ERROR 2018-10-05 09:16:32,479 script_alert.py:123 - [Alert][hive_webhcat_server_status] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:50111/templeton/v1/status?user.name=ambari-qa + \nTraceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_webhcat_server.py", line 190, in execute\n url_response = urllib2.urlopen(query_url, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n'] ERROR 2018-10-05 09:16:32,500 script_alert.py:123 - [Alert][yarn_nodemanager_health] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:8042/ws/v1/node/info (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 171, in execute\n url_response = urllib2.urlopen(query, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n)'] ERROR 2018-10-05 09:16:32,500 script_alert.py:123 - [Alert][yarn_nodemanager_health] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:8042/ws/v1/node/info (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 171, in execute\n url_response = urllib2.urlopen(query, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n)'] WARNING 2018-10-05 09:16:32,515 base_alert.py:138 - [Alert][zeppelin_server_status] Unable to execute alert. 'NoneType' object has no attribute '__getitem__' WARNING 2018-10-05 09:16:32,521 base_alert.py:138 - [Alert][datanode_health_summary] Unable to execute alert. [Alert][datanode_health_summary] Unable to extract JSON from JMX response WARNING 2018-10-05 09:16:32,546 base_alert.py:138 - [Alert][namenode_directory_status] Unable to execute alert. [Alert][namenode_directory_status] Unable to extract JSON from JMX response INFO 2018-10-05 09:16:32,583 logger.py:75 - Execute['! beeline -u 'jdbc:hive2://master.hadoop.com:10000/;transportMode=binary' -e '' 2>&1| awk '{print}'|grep -i -e 'Connection refused' -e 'Invalid URL''] {'path': ['/bin/', '/usr/bin/', '/usr/lib/hive/bin/', '/usr/sbin/'], 'timeout_kill_strategy': 2, 'timeout': 60, 'user': 'ambari-qa'} INFO 2018-10-05 09:16:32,583 logger.py:75 - Execute['! beeline -u 'jdbc:hive2://master.hadoop.com:10000/;transportMode=binary' -e '' 2>&1| awk '{print}'|grep -i -e 'Connection refused' -e 'Invalid URL''] {'path': ['/bin/', '/usr/bin/', '/usr/lib/hive/bin/', '/usr/sbin/'], 'timeout_kill_strategy': 2, 'timeout': 60, 'user': 'ambari-qa'} INFO 2018-10-05 09:16:32,593 logger.py:75 - Execute['export HIVE_CONF_DIR='/usr/hdp/current/hive-metastore/conf' ; hive --hiveconf hive.metastore.uris=thrift://master.hadoop.com:9083 --hiveconf hive.metastore.client.connect.retry.delay=1 --hiveconf hive.metastore.failure.retries=1 --hiveconf hive.metastore.connect.retries=1 --hiveconf hive.metastore.client.socket.timeout=14 --hiveconf hive.execution.engine=mr -e 'show databases;''] {'path': ['/bin/', '/usr/bin/', '/usr/sbin/', u'/usr/hdp/current/hive-metastore/bin'], 'timeout_kill_strategy': 2, 'timeout': 60, 'user': 'ambari-qa'} INFO 2018-10-05 09:16:32,593 logger.py:75 - Execute['export HIVE_CONF_DIR='/usr/hdp/current/hive-metastore/conf' ; hive --hiveconf hive.metastore.uris=thrift://master.hadoop.com:9083 --hiveconf hive.metastore.client.connect.retry.delay=1 --hiveconf hive.metastore.failure.retries=1 --hiveconf hive.metastore.connect.retries=1 --hiveconf hive.metastore.client.socket.timeout=14 --hiveconf hive.execution.engine=mr -e 'show databases;''] {'path': ['/bin/', '/usr/bin/', '/usr/sbin/', u'/usr/hdp/current/hive-metastore/bin'], 'timeout_kill_strategy': 2, 'timeout': 60, 'user': 'ambari-qa'} INFO 2018-10-05 09:16:32,617 logger.py:75 - Pid file /var/run/ambari-metrics-monitor/ambari-metrics-monitor.pid is empty or does not exist INFO 2018-10-05 09:16:32,617 logger.py:75 - Pid file /var/run/ambari-metrics-monitor/ambari-metrics-monitor.pid is empty or does not exist ERROR 2018-10-05 09:16:32,617 script_alert.py:123 - [Alert][ams_metrics_monitor_process] Failed with result CRITICAL: ['Ambari Monitor is NOT running on master.hadoop.com'] ERROR 2018-10-05 09:16:32,617 script_alert.py:123 - [Alert][ams_metrics_monitor_process] Failed with result CRITICAL: ['Ambari Monitor is NOT running on master.hadoop.com'] INFO 2018-10-05 09:16:32,627 logger.py:75 - Execute['source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status'] {'environment': None, 'user': 'oozie'} INFO 2018-10-05 09:16:32,627 logger.py:75 - Execute['source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status'] {'environment': None, 'user': 'oozie'} ERROR 2018-10-05 09:16:36,751 script_alert.py:123 - [Alert][hive_server_process] Failed with result CRITICAL: ['Connection failed on host master.hadoop.com:10000 (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_hive_thrift_port.py", line 212, in execute\n ldap_password=ldap_password)\n File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/hive_check.py", line 81, in check_thrift_port_sasl\n timeout_kill_strategy=TerminateStrategy.KILL_PROCESS_TREE,\n File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 166, in __init__\n self.env.run()\n File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run\n self.run_action(resource, action)\n File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action\n provider_action()\n File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 262, in action_run\n tries=self.resource.tries, try_sleep=self.resource.try_sleep)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner\n result = function(command, **kwargs)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call\n tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper\n result = _call(command, **kwargs_copy)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call\n raise ExecutionFailed(err_msg, code, out, err)\nExecutionFailed: Execution of \'! beeline -u \'jdbc:hive2://master.hadoop.com:10000/;transportMode=binary\' -e \'\' 2>&1| awk \'{print}\'|grep -i -e \'Connection refused\' -e \'Invalid URL\'\' returned 1. Error: Could not open client transport with JDBC Uri: jdbc:hive2://master.hadoop.com:10000/;transportMode=binary: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0)\nError: Could not open client transport with JDBC Uri: jdbc:hive2://master.hadoop.com:10000/;transportMode=binary: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0)\n)'] ERROR 2018-10-05 09:16:36,751 script_alert.py:123 - [Alert][hive_server_process] Failed with result CRITICAL: ['Connection failed on host master.hadoop.com:10000 (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_hive_thrift_port.py", line 212, in execute\n ldap_password=ldap_password)\n File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/hive_check.py", line 81, in check_thrift_port_sasl\n timeout_kill_strategy=TerminateStrategy.KILL_PROCESS_TREE,\n File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 166, in __init__\n self.env.run()\n File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run\n self.run_action(resource, action)\n File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action\n provider_action()\n File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 262, in action_run\n tries=self.resource.tries, try_sleep=self.resource.try_sleep)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner\n result = function(command, **kwargs)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call\n tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper\n result = _call(command, **kwargs_copy)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call\n raise ExecutionFailed(err_msg, code, out, err)\nExecutionFailed: Execution of \'! beeline -u \'jdbc:hive2://master.hadoop.com:10000/;transportMode=binary\' -e \'\' 2>&1| awk \'{print}\'|grep -i -e \'Connection refused\' -e \'Invalid URL\'\' returned 1. Error: Could not open client transport with JDBC Uri: jdbc:hive2://master.hadoop.com:10000/;transportMode=binary: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0)\nError: Could not open client transport with JDBC Uri: jdbc:hive2://master.hadoop.com:10000/;transportMode=binary: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0)\n)'] ERROR 2018-10-05 09:16:39,608 script_alert.py:123 - [Alert][hive_metastore_process] Failed with result CRITICAL: ['Metastore on master.hadoop.com failed (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_hive_metastore.py", line 203, in execute\n timeout_kill_strategy=TerminateStrategy.KILL_PROCESS_TREE,\n File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 166, in __init__\n self.env.run()\n File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run\n self.run_action(resource, action)\n File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action\n provider_action()\n File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 262, in action_run\n tries=self.resource.tries, try_sleep=self.resource.try_sleep)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner\n result = function(command, **kwargs)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call\n tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper\n result = _call(command, **kwargs_copy)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call\n raise ExecutionFailed(err_msg, code, out, err)\nExecutionFailed: Execution of \'export HIVE_CONF_DIR=\'/usr/hdp/current/hive-metastore/conf\' ; hive --hiveconf hive.metastore.uris=thrift://master.hadoop.com:9083 --hiveconf hive.metastore.client.connect.retry.delay=1 --hiveconf hive.metastore.failure.retries=1 --hiveconf hive.metastore.connect.retries=1 --hiveconf hive.metastore.client.socket.timeout=14 --hiveconf hive.execution.engine=mr -e \'show databases;\'\' returned 1. log4j:WARN No such property [maxFileSize] in org.apache.log4j.DailyRollingFileAppender.\n\nLogging initialized using configuration in file:/etc/hive/2.6.5.0-292/0/hive-log4j.properties\nException in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient\n\tat org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:569)\n\tat org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)\n\tat org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.lang.reflect.Method.invoke(Method.java:498)\n\tat org.apache.hadoop.util.RunJar.run(RunJar.java:233)\n\tat org.apache.hadoop.util.RunJar.main(RunJar.java:148)\nCaused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient\n\tat org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1566)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:92)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:138)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:110)\n\tat org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3556)\n\tat org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3588)\n\tat org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:550)\n\t... 8 more\nCaused by: java.lang.reflect.InvocationTargetException\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)\n\tat sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)\n\tat java.lang.reflect.Constructor.newInstance(Constructor.java:423)\n\tat org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1564)\n\t... 14 more\nCaused by: MetaException(message:Could not connect to meta store using any of the URIs provided. Most recent failure: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused (Connection refused)\n\tat org.apache.thrift.transport.TSocket.open(TSocket.java:226)\n\tat org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:487)\n\tat org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:282)\n\tat org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:76)\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)\n\tat sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)\n\tat java.lang.reflect.Constructor.newInstance(Constructor.java:423)\n\tat org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1564)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:92)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:138)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:110)\n\tat org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3556)\n\tat org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3588)\n\tat org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:550)\n\tat org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)\n\tat org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.lang.reflect.Method.invoke(Method.java:498)\n\tat org.apache.hadoop.util.RunJar.run(RunJar.java:233)\n\tat org.apache.hadoop.util.RunJar.main(RunJar.java:148)\nCaused by: java.net.ConnectException: Connection refused (Connection refused)\n\tat java.net.PlainSocketImpl.socketConnect(Native Method)\n\tat java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)\n\tat java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)\n\tat java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)\n\tat java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)\n\tat java.net.Socket.connect(Socket.java:589)\n\tat org.apache.thrift.transport.TSocket.open(TSocket.java:221)\n\t... 22 more\n)\n\tat org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:534)\n\tat org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:282)\n\tat org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:76)\n\t... 19 more\n)'] ERROR 2018-10-05 09:16:39,608 script_alert.py:123 - [Alert][hive_metastore_process] Failed with result CRITICAL: ['Metastore on master.hadoop.com failed (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_hive_metastore.py", line 203, in execute\n timeout_kill_strategy=TerminateStrategy.KILL_PROCESS_TREE,\n File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 166, in __init__\n self.env.run()\n File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run\n self.run_action(resource, action)\n File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action\n provider_action()\n File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 262, in action_run\n tries=self.resource.tries, try_sleep=self.resource.try_sleep)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner\n result = function(command, **kwargs)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call\n tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper\n result = _call(command, **kwargs_copy)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call\n raise ExecutionFailed(err_msg, code, out, err)\nExecutionFailed: Execution of \'export HIVE_CONF_DIR=\'/usr/hdp/current/hive-metastore/conf\' ; hive --hiveconf hive.metastore.uris=thrift://master.hadoop.com:9083 --hiveconf hive.metastore.client.connect.retry.delay=1 --hiveconf hive.metastore.failure.retries=1 --hiveconf hive.metastore.connect.retries=1 --hiveconf hive.metastore.client.socket.timeout=14 --hiveconf hive.execution.engine=mr -e \'show databases;\'\' returned 1. log4j:WARN No such property [maxFileSize] in org.apache.log4j.DailyRollingFileAppender.\n\nLogging initialized using configuration in file:/etc/hive/2.6.5.0-292/0/hive-log4j.properties\nException in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient\n\tat org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:569)\n\tat org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)\n\tat org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.lang.reflect.Method.invoke(Method.java:498)\n\tat org.apache.hadoop.util.RunJar.run(RunJar.java:233)\n\tat org.apache.hadoop.util.RunJar.main(RunJar.java:148)\nCaused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient\n\tat org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1566)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:92)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:138)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:110)\n\tat org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3556)\n\tat org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3588)\n\tat org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:550)\n\t... 8 more\nCaused by: java.lang.reflect.InvocationTargetException\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)\n\tat sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)\n\tat java.lang.reflect.Constructor.newInstance(Constructor.java:423)\n\tat org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1564)\n\t... 14 more\nCaused by: MetaException(message:Could not connect to meta store using any of the URIs provided. Most recent failure: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused (Connection refused)\n\tat org.apache.thrift.transport.TSocket.open(TSocket.java:226)\n\tat org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:487)\n\tat org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:282)\n\tat org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:76)\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)\n\tat sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)\n\tat java.lang.reflect.Constructor.newInstance(Constructor.java:423)\n\tat org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1564)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:92)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:138)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:110)\n\tat org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3556)\n\tat org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3588)\n\tat org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:550)\n\tat org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)\n\tat org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.lang.reflect.Method.invoke(Method.java:498)\n\tat org.apache.hadoop.util.RunJar.run(RunJar.java:233)\n\tat org.apache.hadoop.util.RunJar.main(RunJar.java:148)\nCaused by: java.net.ConnectException: Connection refused (Connection refused)\n\tat java.net.PlainSocketImpl.socketConnect(Native Method)\n\tat java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)\n\tat java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)\n\tat java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)\n\tat java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)\n\tat java.net.Socket.connect(Socket.java:589)\n\tat org.apache.thrift.transport.TSocket.open(TSocket.java:221)\n\t... 22 more\n)\n\tat org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:534)\n\tat org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:282)\n\tat org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:76)\n\t... 19 more\n)'] ERROR 2018-10-05 09:16:48,591 script_alert.py:123 - [Alert][oozie_server_status] Failed with result CRITICAL: ["Execution of 'source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status' returned 255. Connection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 1 sec. Retry count = 1\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 2 sec. Retry count = 2\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 4 sec. Retry count = 3\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 8 sec. Retry count = 4\nError: IO_ERROR : java.io.IOException: Error while connecting Oozie server. No of retries = 4. Exception = Connection refused"] ERROR 2018-10-05 09:16:48,591 script_alert.py:123 - [Alert][oozie_server_status] Failed with result CRITICAL: ["Execution of 'source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status' returned 255. Connection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 1 sec. Retry count = 1\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 2 sec. Retry count = 2\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 4 sec. Retry count = 3\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 8 sec. Retry count = 4\nError: IO_ERROR : java.io.IOException: Error while connecting Oozie server. No of retries = 4. Exception = Connection refused"] INFO 2018-10-05 09:16:49,279 Controller.py:304 - Heartbeat (response id = 594) with server is running... INFO 2018-10-05 09:16:49,279 Controller.py:311 - Building heartbeat message INFO 2018-10-05 09:16:49,284 Heartbeat.py:87 - Adding host info/state to heartbeat message. INFO 2018-10-05 09:16:49,447 logger.py:75 - Testing the JVM's JCE policy to see it if supports an unlimited key length. INFO 2018-10-05 09:16:49,447 logger.py:75 - Testing the JVM's JCE policy to see it if supports an unlimited key length. INFO 2018-10-05 09:16:49,690 Hardware.py:188 - Some mount points were ignored: /dev/shm, /run, /sys/fs/cgroup, /run/user/42, /run/user/0 INFO 2018-10-05 09:16:49,691 Controller.py:320 - Sending Heartbeat (id = 594) INFO 2018-10-05 09:16:49,693 Controller.py:333 - Heartbeat response received (id = 595) INFO 2018-10-05 09:16:49,693 Controller.py:342 - Heartbeat interval is 1 seconds INFO 2018-10-05 09:16:49,693 Controller.py:380 - Updating configurations from heartbeat INFO 2018-10-05 09:16:49,693 Controller.py:389 - Adding cancel/execution commands INFO 2018-10-05 09:16:49,693 Controller.py:475 - Waiting 0.9 for next heartbeat INFO 2018-10-05 09:16:50,593 Controller.py:482 - Wait for next heartbeat over WARNING 2018-10-05 09:17:32,455 base_alert.py:138 - [Alert][hbase_master_cpu] Unable to execute alert. [Alert][hbase_master_cpu] Unable to extract JSON from JMX response WARNING 2018-10-05 09:17:32,477 base_alert.py:138 - [Alert][yarn_resourcemanager_cpu] Unable to execute alert. [Alert][yarn_resourcemanager_cpu] Unable to extract JSON from JMX response ERROR 2018-10-05 09:17:32,478 script_alert.py:123 - [Alert][hive_webhcat_server_status] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:50111/templeton/v1/status?user.name=ambari-qa + \nTraceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_webhcat_server.py", line 190, in execute\n url_response = urllib2.urlopen(query_url, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n'] ERROR 2018-10-05 09:17:32,480 script_alert.py:123 - [Alert][yarn_nodemanager_health] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:8042/ws/v1/node/info (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 171, in execute\n url_response = urllib2.urlopen(query, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n)'] ERROR 2018-10-05 09:17:32,478 script_alert.py:123 - [Alert][hive_webhcat_server_status] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:50111/templeton/v1/status?user.name=ambari-qa + \nTraceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_webhcat_server.py", line 190, in execute\n url_response = urllib2.urlopen(query_url, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n'] WARNING 2018-10-05 09:17:32,485 base_alert.py:138 - [Alert][yarn_resourcemanager_rpc_latency] Unable to execute alert. [Alert][yarn_resourcemanager_rpc_latency] Unable to extract JSON from JMX response ERROR 2018-10-05 09:17:32,480 script_alert.py:123 - [Alert][yarn_nodemanager_health] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:8042/ws/v1/node/info (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 171, in execute\n url_response = urllib2.urlopen(query, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n)'] WARNING 2018-10-05 09:17:32,502 base_alert.py:138 - [Alert][ams_metrics_collector_hbase_master_cpu] Unable to execute alert. [Alert][ams_metrics_collector_hbase_master_cpu] Unable to extract JSON from JMX response WARNING 2018-10-05 09:17:32,509 base_alert.py:138 - [Alert][zeppelin_server_status] Unable to execute alert. 'NoneType' object has no attribute '__getitem__' WARNING 2018-10-05 09:17:32,510 base_alert.py:138 - [Alert][namenode_cpu] Unable to execute alert. [Alert][namenode_cpu] Unable to extract JSON from JMX response WARNING 2018-10-05 09:17:32,514 base_alert.py:138 - [Alert][namenode_hdfs_pending_deletion_blocks] Unable to execute alert. [Alert][namenode_hdfs_pending_deletion_blocks] Unable to extract JSON from JMX response WARNING 2018-10-05 09:17:32,517 base_alert.py:138 - [Alert][datanode_heap_usage] Unable to execute alert. [Alert][datanode_heap_usage] Unable to extract JSON from JMX response WARNING 2018-10-05 09:17:32,524 base_alert.py:138 - [Alert][datanode_health_summary] Unable to execute alert. [Alert][datanode_health_summary] Unable to extract JSON from JMX response ERROR 2018-10-05 09:17:32,525 script_alert.py:123 - [Alert][datanode_unmounted_data_dir] Failed with result CRITICAL: ['The following data dir(s) were not found: /hadoop/hdfs/data\n/root/hadoop/hdfs/data\n'] ERROR 2018-10-05 09:17:32,525 script_alert.py:123 - [Alert][datanode_unmounted_data_dir] Failed with result CRITICAL: ['The following data dir(s) were not found: /hadoop/hdfs/data\n/root/hadoop/hdfs/data\n'] WARNING 2018-10-05 09:17:32,531 base_alert.py:138 - [Alert][namenode_hdfs_blocks_health] Unable to execute alert. [Alert][namenode_hdfs_blocks_health] Unable to extract JSON from JMX response WARNING 2018-10-05 09:17:32,545 base_alert.py:138 - [Alert][datanode_storage] Unable to execute alert. [Alert][datanode_storage] Unable to extract JSON from JMX response WARNING 2018-10-05 09:17:32,547 base_alert.py:138 - [Alert][namenode_service_rpc_processing_latency_hourly] Unable to execute alert. Couldn't define hadoop_conf_dir: argument of type 'NoneType' is not iterable WARNING 2018-10-05 09:17:32,557 base_alert.py:138 - [Alert][namenode_directory_status] Unable to execute alert. [Alert][namenode_directory_status] Unable to extract JSON from JMX response WARNING 2018-10-05 09:17:32,570 base_alert.py:138 - [Alert][namenode_client_rpc_processing_latency_hourly] Unable to execute alert. Couldn't define hadoop_conf_dir: argument of type 'NoneType' is not iterable WARNING 2018-10-05 09:17:32,570 base_alert.py:138 - [Alert][namenode_rpc_latency] Unable to execute alert. [Alert][namenode_rpc_latency] Unable to extract JSON from JMX response WARNING 2018-10-05 09:17:32,572 base_alert.py:138 - [Alert][namenode_client_rpc_queue_latency_hourly] Unable to execute alert. Couldn't define hadoop_conf_dir: argument of type 'NoneType' is not iterable WARNING 2018-10-05 09:17:32,572 base_alert.py:138 - [Alert][namenode_hdfs_capacity_utilization] Unable to execute alert. [Alert][namenode_hdfs_capacity_utilization] Unable to extract JSON from JMX response WARNING 2018-10-05 09:17:32,574 base_alert.py:138 - [Alert][namenode_service_rpc_queue_latency_hourly] Unable to execute alert. Couldn't define hadoop_conf_dir: argument of type 'NoneType' is not iterable WARNING 2018-10-05 09:17:32,585 base_alert.py:138 - [Alert][mapreduce_history_server_rpc_latency] Unable to execute alert. [Alert][mapreduce_history_server_rpc_latency] Unable to extract JSON from JMX response WARNING 2018-10-05 09:17:32,586 base_alert.py:138 - [Alert][mapreduce_history_server_cpu] Unable to execute alert. [Alert][mapreduce_history_server_cpu] Unable to extract JSON from JMX response WARNING 2018-10-05 09:17:32,591 logger.py:71 - Cannot find the stack name in the command. Stack tools cannot be loaded WARNING 2018-10-05 09:17:32,591 logger.py:71 - Cannot find the stack name in the command. Stack tools cannot be loaded INFO 2018-10-05 09:17:32,591 logger.py:75 - call[('ambari-python-wrap', None, 'versions')] {} INFO 2018-10-05 09:17:32,591 logger.py:75 - call[('ambari-python-wrap', None, 'versions')] {} INFO 2018-10-05 09:17:32,613 logger.py:75 - call returned (1, "/bin/ambari-python-wrap: can't find '__main__' module in ''") INFO 2018-10-05 09:17:32,613 logger.py:75 - call returned (1, "/bin/ambari-python-wrap: can't find '__main__' module in ''") ERROR 2018-10-05 09:17:32,613 script_alert.py:123 - [Alert][ambari_agent_version_select] Failed with result CRITICAL: ["hdp-select could not properly read /usr/hdp. Check this directory for unexpected contents.\n/bin/ambari-python-wrap: can't find '__main__' module in ''"] ERROR 2018-10-05 09:17:32,613 script_alert.py:123 - [Alert][ambari_agent_version_select] Failed with result CRITICAL: ["hdp-select could not properly read /usr/hdp. Check this directory for unexpected contents.\n/bin/ambari-python-wrap: can't find '__main__' module in ''"] INFO 2018-10-05 09:17:32,616 logger.py:75 - Pid file /var/run/ambari-metrics-monitor/ambari-metrics-monitor.pid is empty or does not exist INFO 2018-10-05 09:17:32,616 logger.py:75 - Pid file /var/run/ambari-metrics-monitor/ambari-metrics-monitor.pid is empty or does not exist ERROR 2018-10-05 09:17:32,617 script_alert.py:123 - [Alert][ams_metrics_monitor_process] Failed with result CRITICAL: ['Ambari Monitor is NOT running on master.hadoop.com'] ERROR 2018-10-05 09:17:32,617 script_alert.py:123 - [Alert][ams_metrics_monitor_process] Failed with result CRITICAL: ['Ambari Monitor is NOT running on master.hadoop.com'] INFO 2018-10-05 09:17:32,619 logger.py:75 - Execute['source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status'] {'environment': None, 'user': 'oozie'} INFO 2018-10-05 09:17:32,619 logger.py:75 - Execute['source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status'] {'environment': None, 'user': 'oozie'} ERROR 2018-10-05 09:17:48,616 script_alert.py:123 - [Alert][oozie_server_status] Failed with result CRITICAL: ["Execution of 'source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status' returned 255. Connection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 1 sec. Retry count = 1\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 2 sec. Retry count = 2\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 4 sec. Retry count = 3\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 8 sec. Retry count = 4\nError: IO_ERROR : java.io.IOException: Error while connecting Oozie server. No of retries = 4. Exception = Connection refused"] ERROR 2018-10-05 09:17:48,616 script_alert.py:123 - [Alert][oozie_server_status] Failed with result CRITICAL: ["Execution of 'source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status' returned 255. Connection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 1 sec. Retry count = 1\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 2 sec. Retry count = 2\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 4 sec. Retry count = 3\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 8 sec. Retry count = 4\nError: IO_ERROR : java.io.IOException: Error while connecting Oozie server. No of retries = 4. Exception = Connection refused"] INFO 2018-10-05 09:17:49,761 Controller.py:304 - Heartbeat (response id = 660) with server is running... INFO 2018-10-05 09:17:49,761 Controller.py:311 - Building heartbeat message INFO 2018-10-05 09:17:49,765 Heartbeat.py:87 - Adding host info/state to heartbeat message. INFO 2018-10-05 09:17:49,912 logger.py:75 - Testing the JVM's JCE policy to see it if supports an unlimited key length. INFO 2018-10-05 09:17:49,912 logger.py:75 - Testing the JVM's JCE policy to see it if supports an unlimited key length. INFO 2018-10-05 09:17:50,107 Hardware.py:188 - Some mount points were ignored: /dev/shm, /run, /sys/fs/cgroup, /run/user/42, /run/user/0 INFO 2018-10-05 09:17:50,108 Controller.py:320 - Sending Heartbeat (id = 660) INFO 2018-10-05 09:17:50,110 Controller.py:333 - Heartbeat response received (id = 661) INFO 2018-10-05 09:17:50,110 Controller.py:342 - Heartbeat interval is 1 seconds INFO 2018-10-05 09:17:50,110 Controller.py:380 - Updating configurations from heartbeat INFO 2018-10-05 09:17:50,110 Controller.py:389 - Adding cancel/execution commands INFO 2018-10-05 09:17:50,110 Controller.py:475 - Waiting 0.9 for next heartbeat INFO 2018-10-05 09:17:51,011 Controller.py:482 - Wait for next heartbeat over ERROR 2018-10-05 09:18:32,486 script_alert.py:123 - [Alert][hive_webhcat_server_status] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:50111/templeton/v1/status?user.name=ambari-qa + \nTraceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_webhcat_server.py", line 190, in execute\n url_response = urllib2.urlopen(query_url, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n'] ERROR 2018-10-05 09:18:32,486 script_alert.py:123 - [Alert][hive_webhcat_server_status] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:50111/templeton/v1/status?user.name=ambari-qa + \nTraceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_webhcat_server.py", line 190, in execute\n url_response = urllib2.urlopen(query_url, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n'] ERROR 2018-10-05 09:18:32,496 script_alert.py:123 - [Alert][yarn_nodemanager_health] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:8042/ws/v1/node/info (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 171, in execute\n url_response = urllib2.urlopen(query, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n)'] ERROR 2018-10-05 09:18:32,496 script_alert.py:123 - [Alert][yarn_nodemanager_health] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:8042/ws/v1/node/info (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 171, in execute\n url_response = urllib2.urlopen(query, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n)'] WARNING 2018-10-05 09:18:32,516 base_alert.py:138 - [Alert][zeppelin_server_status] Unable to execute alert. 'NoneType' object has no attribute '__getitem__' WARNING 2018-10-05 09:18:32,522 base_alert.py:138 - [Alert][datanode_health_summary] Unable to execute alert. [Alert][datanode_health_summary] Unable to extract JSON from JMX response WARNING 2018-10-05 09:18:32,541 base_alert.py:138 - [Alert][namenode_directory_status] Unable to execute alert. [Alert][namenode_directory_status] Unable to extract JSON from JMX response INFO 2018-10-05 09:18:32,584 logger.py:75 - Pid file /var/run/ambari-metrics-monitor/ambari-metrics-monitor.pid is empty or does not exist INFO 2018-10-05 09:18:32,584 logger.py:75 - Pid file /var/run/ambari-metrics-monitor/ambari-metrics-monitor.pid is empty or does not exist ERROR 2018-10-05 09:18:32,584 script_alert.py:123 - [Alert][ams_metrics_monitor_process] Failed with result CRITICAL: ['Ambari Monitor is NOT running on master.hadoop.com'] ERROR 2018-10-05 09:18:32,584 script_alert.py:123 - [Alert][ams_metrics_monitor_process] Failed with result CRITICAL: ['Ambari Monitor is NOT running on master.hadoop.com'] INFO 2018-10-05 09:18:32,601 logger.py:75 - Execute['source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status'] {'environment': None, 'user': 'oozie'} INFO 2018-10-05 09:18:32,601 logger.py:75 - Execute['source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status'] {'environment': None, 'user': 'oozie'} ERROR 2018-10-05 09:18:48,524 script_alert.py:123 - [Alert][oozie_server_status] Failed with result CRITICAL: ["Execution of 'source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status' returned 255. Connection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 1 sec. Retry count = 1\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 2 sec. Retry count = 2\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 4 sec. Retry count = 3\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 8 sec. Retry count = 4\nError: IO_ERROR : java.io.IOException: Error while connecting Oozie server. No of retries = 4. Exception = Connection refused"] ERROR 2018-10-05 09:18:48,524 script_alert.py:123 - [Alert][oozie_server_status] Failed with result CRITICAL: ["Execution of 'source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status' returned 255. Connection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 1 sec. Retry count = 1\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 2 sec. Retry count = 2\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 4 sec. Retry count = 3\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 8 sec. Retry count = 4\nError: IO_ERROR : java.io.IOException: Error while connecting Oozie server. No of retries = 4. Exception = Connection refused"] INFO 2018-10-05 09:18:50,175 Controller.py:304 - Heartbeat (response id = 726) with server is running... INFO 2018-10-05 09:18:50,176 Controller.py:311 - Building heartbeat message INFO 2018-10-05 09:18:50,180 Heartbeat.py:87 - Adding host info/state to heartbeat message. INFO 2018-10-05 09:18:50,312 logger.py:75 - Testing the JVM's JCE policy to see it if supports an unlimited key length. INFO 2018-10-05 09:18:50,312 logger.py:75 - Testing the JVM's JCE policy to see it if supports an unlimited key length. INFO 2018-10-05 09:18:50,525 Hardware.py:188 - Some mount points were ignored: /dev/shm, /run, /sys/fs/cgroup, /run/user/42, /run/user/0 INFO 2018-10-05 09:18:50,527 Controller.py:320 - Sending Heartbeat (id = 726) INFO 2018-10-05 09:18:50,530 Controller.py:333 - Heartbeat response received (id = 727) INFO 2018-10-05 09:18:50,530 Controller.py:342 - Heartbeat interval is 1 seconds INFO 2018-10-05 09:18:50,531 Controller.py:380 - Updating configurations from heartbeat INFO 2018-10-05 09:18:50,531 Controller.py:389 - Adding cancel/execution commands INFO 2018-10-05 09:18:50,531 Controller.py:475 - Waiting 0.9 for next heartbeat INFO 2018-10-05 09:18:51,432 Controller.py:482 - Wait for next heartbeat over ERROR 2018-10-05 09:19:32,470 script_alert.py:123 - [Alert][hive_webhcat_server_status] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:50111/templeton/v1/status?user.name=ambari-qa + \nTraceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_webhcat_server.py", line 190, in execute\n url_response = urllib2.urlopen(query_url, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n'] ERROR 2018-10-05 09:19:32,470 script_alert.py:123 - [Alert][hive_webhcat_server_status] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:50111/templeton/v1/status?user.name=ambari-qa + \nTraceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_webhcat_server.py", line 190, in execute\n url_response = urllib2.urlopen(query_url, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n'] ERROR 2018-10-05 09:19:32,483 script_alert.py:123 - [Alert][yarn_nodemanager_health] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:8042/ws/v1/node/info (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 171, in execute\n url_response = urllib2.urlopen(query, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n)'] ERROR 2018-10-05 09:19:32,483 script_alert.py:123 - [Alert][yarn_nodemanager_health] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:8042/ws/v1/node/info (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 171, in execute\n url_response = urllib2.urlopen(query, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n)'] WARNING 2018-10-05 09:19:32,491 base_alert.py:138 - [Alert][zeppelin_server_status] Unable to execute alert. 'NoneType' object has no attribute '__getitem__' WARNING 2018-10-05 09:19:32,501 base_alert.py:138 - [Alert][namenode_hdfs_pending_deletion_blocks] Unable to execute alert. [Alert][namenode_hdfs_pending_deletion_blocks] Unable to extract JSON from JMX response WARNING 2018-10-05 09:19:32,506 base_alert.py:138 - [Alert][datanode_heap_usage] Unable to execute alert. [Alert][datanode_heap_usage] Unable to extract JSON from JMX response WARNING 2018-10-05 09:19:32,510 base_alert.py:138 - [Alert][datanode_health_summary] Unable to execute alert. [Alert][datanode_health_summary] Unable to extract JSON from JMX response ERROR 2018-10-05 09:19:32,512 script_alert.py:123 - [Alert][datanode_unmounted_data_dir] Failed with result CRITICAL: ['The following data dir(s) were not found: /hadoop/hdfs/data\n/root/hadoop/hdfs/data\n'] ERROR 2018-10-05 09:19:32,512 script_alert.py:123 - [Alert][datanode_unmounted_data_dir] Failed with result CRITICAL: ['The following data dir(s) were not found: /hadoop/hdfs/data\n/root/hadoop/hdfs/data\n'] WARNING 2018-10-05 09:19:32,527 base_alert.py:138 - [Alert][namenode_hdfs_blocks_health] Unable to execute alert. [Alert][namenode_hdfs_blocks_health] Unable to extract JSON from JMX response WARNING 2018-10-05 09:19:32,533 base_alert.py:138 - [Alert][datanode_storage] Unable to execute alert. [Alert][datanode_storage] Unable to extract JSON from JMX response WARNING 2018-10-05 09:19:32,537 base_alert.py:138 - [Alert][namenode_directory_status] Unable to execute alert. [Alert][namenode_directory_status] Unable to extract JSON from JMX response WARNING 2018-10-05 09:19:32,539 base_alert.py:138 - [Alert][namenode_hdfs_capacity_utilization] Unable to execute alert. [Alert][namenode_hdfs_capacity_utilization] Unable to extract JSON from JMX response WARNING 2018-10-05 09:19:32,544 base_alert.py:138 - [Alert][namenode_rpc_latency] Unable to execute alert. [Alert][namenode_rpc_latency] Unable to extract JSON from JMX response INFO 2018-10-05 09:19:32,580 logger.py:75 - Execute['export HIVE_CONF_DIR='/usr/hdp/current/hive-metastore/conf' ; hive --hiveconf hive.metastore.uris=thrift://master.hadoop.com:9083 --hiveconf hive.metastore.client.connect.retry.delay=1 --hiveconf hive.metastore.failure.retries=1 --hiveconf hive.metastore.connect.retries=1 --hiveconf hive.metastore.client.socket.timeout=14 --hiveconf hive.execution.engine=mr -e 'show databases;''] {'path': ['/bin/', '/usr/bin/', '/usr/sbin/', u'/usr/hdp/current/hive-metastore/bin'], 'timeout_kill_strategy': 2, 'timeout': 60, 'user': 'ambari-qa'} INFO 2018-10-05 09:19:32,580 logger.py:75 - Execute['export HIVE_CONF_DIR='/usr/hdp/current/hive-metastore/conf' ; hive --hiveconf hive.metastore.uris=thrift://master.hadoop.com:9083 --hiveconf hive.metastore.client.connect.retry.delay=1 --hiveconf hive.metastore.failure.retries=1 --hiveconf hive.metastore.connect.retries=1 --hiveconf hive.metastore.client.socket.timeout=14 --hiveconf hive.execution.engine=mr -e 'show databases;''] {'path': ['/bin/', '/usr/bin/', '/usr/sbin/', u'/usr/hdp/current/hive-metastore/bin'], 'timeout_kill_strategy': 2, 'timeout': 60, 'user': 'ambari-qa'} INFO 2018-10-05 09:19:32,591 logger.py:75 - Pid file /var/run/ambari-metrics-monitor/ambari-metrics-monitor.pid is empty or does not exist INFO 2018-10-05 09:19:32,591 logger.py:75 - Pid file /var/run/ambari-metrics-monitor/ambari-metrics-monitor.pid is empty or does not exist ERROR 2018-10-05 09:19:32,594 script_alert.py:123 - [Alert][ams_metrics_monitor_process] Failed with result CRITICAL: ['Ambari Monitor is NOT running on master.hadoop.com'] ERROR 2018-10-05 09:19:32,594 script_alert.py:123 - [Alert][ams_metrics_monitor_process] Failed with result CRITICAL: ['Ambari Monitor is NOT running on master.hadoop.com'] INFO 2018-10-05 09:19:32,611 logger.py:75 - Execute['! beeline -u 'jdbc:hive2://master.hadoop.com:10000/;transportMode=binary' -e '' 2>&1| awk '{print}'|grep -i -e 'Connection refused' -e 'Invalid URL''] {'path': ['/bin/', '/usr/bin/', '/usr/lib/hive/bin/', '/usr/sbin/'], 'timeout_kill_strategy': 2, 'timeout': 60, 'user': 'ambari-qa'} INFO 2018-10-05 09:19:32,611 logger.py:75 - Execute['! beeline -u 'jdbc:hive2://master.hadoop.com:10000/;transportMode=binary' -e '' 2>&1| awk '{print}'|grep -i -e 'Connection refused' -e 'Invalid URL''] {'path': ['/bin/', '/usr/bin/', '/usr/lib/hive/bin/', '/usr/sbin/'], 'timeout_kill_strategy': 2, 'timeout': 60, 'user': 'ambari-qa'} INFO 2018-10-05 09:19:32,621 logger.py:75 - Execute['source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status'] {'environment': None, 'user': 'oozie'} INFO 2018-10-05 09:19:32,621 logger.py:75 - Execute['source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status'] {'environment': None, 'user': 'oozie'} ERROR 2018-10-05 09:19:36,665 script_alert.py:123 - [Alert][hive_server_process] Failed with result CRITICAL: ['Connection failed on host master.hadoop.com:10000 (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_hive_thrift_port.py", line 212, in execute\n ldap_password=ldap_password)\n File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/hive_check.py", line 81, in check_thrift_port_sasl\n timeout_kill_strategy=TerminateStrategy.KILL_PROCESS_TREE,\n File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 166, in __init__\n self.env.run()\n File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run\n self.run_action(resource, action)\n File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action\n provider_action()\n File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 262, in action_run\n tries=self.resource.tries, try_sleep=self.resource.try_sleep)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner\n result = function(command, **kwargs)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call\n tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper\n result = _call(command, **kwargs_copy)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call\n raise ExecutionFailed(err_msg, code, out, err)\nExecutionFailed: Execution of \'! beeline -u \'jdbc:hive2://master.hadoop.com:10000/;transportMode=binary\' -e \'\' 2>&1| awk \'{print}\'|grep -i -e \'Connection refused\' -e \'Invalid URL\'\' returned 1. Error: Could not open client transport with JDBC Uri: jdbc:hive2://master.hadoop.com:10000/;transportMode=binary: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0)\nError: Could not open client transport with JDBC Uri: jdbc:hive2://master.hadoop.com:10000/;transportMode=binary: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0)\n)'] ERROR 2018-10-05 09:19:36,665 script_alert.py:123 - [Alert][hive_server_process] Failed with result CRITICAL: ['Connection failed on host master.hadoop.com:10000 (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_hive_thrift_port.py", line 212, in execute\n ldap_password=ldap_password)\n File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/hive_check.py", line 81, in check_thrift_port_sasl\n timeout_kill_strategy=TerminateStrategy.KILL_PROCESS_TREE,\n File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 166, in __init__\n self.env.run()\n File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run\n self.run_action(resource, action)\n File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action\n provider_action()\n File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 262, in action_run\n tries=self.resource.tries, try_sleep=self.resource.try_sleep)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner\n result = function(command, **kwargs)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call\n tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper\n result = _call(command, **kwargs_copy)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call\n raise ExecutionFailed(err_msg, code, out, err)\nExecutionFailed: Execution of \'! beeline -u \'jdbc:hive2://master.hadoop.com:10000/;transportMode=binary\' -e \'\' 2>&1| awk \'{print}\'|grep -i -e \'Connection refused\' -e \'Invalid URL\'\' returned 1. Error: Could not open client transport with JDBC Uri: jdbc:hive2://master.hadoop.com:10000/;transportMode=binary: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0)\nError: Could not open client transport with JDBC Uri: jdbc:hive2://master.hadoop.com:10000/;transportMode=binary: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0)\n)'] ERROR 2018-10-05 09:19:39,676 script_alert.py:123 - [Alert][hive_metastore_process] Failed with result CRITICAL: ['Metastore on master.hadoop.com failed (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_hive_metastore.py", line 203, in execute\n timeout_kill_strategy=TerminateStrategy.KILL_PROCESS_TREE,\n File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 166, in __init__\n self.env.run()\n File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run\n self.run_action(resource, action)\n File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action\n provider_action()\n File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 262, in action_run\n tries=self.resource.tries, try_sleep=self.resource.try_sleep)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner\n result = function(command, **kwargs)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call\n tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper\n result = _call(command, **kwargs_copy)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call\n raise ExecutionFailed(err_msg, code, out, err)\nExecutionFailed: Execution of \'export HIVE_CONF_DIR=\'/usr/hdp/current/hive-metastore/conf\' ; hive --hiveconf hive.metastore.uris=thrift://master.hadoop.com:9083 --hiveconf hive.metastore.client.connect.retry.delay=1 --hiveconf hive.metastore.failure.retries=1 --hiveconf hive.metastore.connect.retries=1 --hiveconf hive.metastore.client.socket.timeout=14 --hiveconf hive.execution.engine=mr -e \'show databases;\'\' returned 1. log4j:WARN No such property [maxFileSize] in org.apache.log4j.DailyRollingFileAppender.\n\nLogging initialized using configuration in file:/etc/hive/2.6.5.0-292/0/hive-log4j.properties\nException in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient\n\tat org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:569)\n\tat org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)\n\tat org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.lang.reflect.Method.invoke(Method.java:498)\n\tat org.apache.hadoop.util.RunJar.run(RunJar.java:233)\n\tat org.apache.hadoop.util.RunJar.main(RunJar.java:148)\nCaused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient\n\tat org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1566)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:92)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:138)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:110)\n\tat org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3556)\n\tat org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3588)\n\tat org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:550)\n\t... 8 more\nCaused by: java.lang.reflect.InvocationTargetException\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)\n\tat sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)\n\tat java.lang.reflect.Constructor.newInstance(Constructor.java:423)\n\tat org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1564)\n\t... 14 more\nCaused by: MetaException(message:Could not connect to meta store using any of the URIs provided. Most recent failure: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused (Connection refused)\n\tat org.apache.thrift.transport.TSocket.open(TSocket.java:226)\n\tat org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:487)\n\tat org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:282)\n\tat org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:76)\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)\n\tat sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)\n\tat java.lang.reflect.Constructor.newInstance(Constructor.java:423)\n\tat org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1564)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:92)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:138)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:110)\n\tat org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3556)\n\tat org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3588)\n\tat org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:550)\n\tat org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)\n\tat org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.lang.reflect.Method.invoke(Method.java:498)\n\tat org.apache.hadoop.util.RunJar.run(RunJar.java:233)\n\tat org.apache.hadoop.util.RunJar.main(RunJar.java:148)\nCaused by: java.net.ConnectException: Connection refused (Connection refused)\n\tat java.net.PlainSocketImpl.socketConnect(Native Method)\n\tat java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)\n\tat java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)\n\tat java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)\n\tat java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)\n\tat java.net.Socket.connect(Socket.java:589)\n\tat org.apache.thrift.transport.TSocket.open(TSocket.java:221)\n\t... 22 more\n)\n\tat org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:534)\n\tat org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:282)\n\tat org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:76)\n\t... 19 more\n)'] ERROR 2018-10-05 09:19:39,676 script_alert.py:123 - [Alert][hive_metastore_process] Failed with result CRITICAL: ['Metastore on master.hadoop.com failed (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_hive_metastore.py", line 203, in execute\n timeout_kill_strategy=TerminateStrategy.KILL_PROCESS_TREE,\n File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 166, in __init__\n self.env.run()\n File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run\n self.run_action(resource, action)\n File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action\n provider_action()\n File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 262, in action_run\n tries=self.resource.tries, try_sleep=self.resource.try_sleep)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner\n result = function(command, **kwargs)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call\n tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper\n result = _call(command, **kwargs_copy)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call\n raise ExecutionFailed(err_msg, code, out, err)\nExecutionFailed: Execution of \'export HIVE_CONF_DIR=\'/usr/hdp/current/hive-metastore/conf\' ; hive --hiveconf hive.metastore.uris=thrift://master.hadoop.com:9083 --hiveconf hive.metastore.client.connect.retry.delay=1 --hiveconf hive.metastore.failure.retries=1 --hiveconf hive.metastore.connect.retries=1 --hiveconf hive.metastore.client.socket.timeout=14 --hiveconf hive.execution.engine=mr -e \'show databases;\'\' returned 1. log4j:WARN No such property [maxFileSize] in org.apache.log4j.DailyRollingFileAppender.\n\nLogging initialized using configuration in file:/etc/hive/2.6.5.0-292/0/hive-log4j.properties\nException in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient\n\tat org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:569)\n\tat org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)\n\tat org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.lang.reflect.Method.invoke(Method.java:498)\n\tat org.apache.hadoop.util.RunJar.run(RunJar.java:233)\n\tat org.apache.hadoop.util.RunJar.main(RunJar.java:148)\nCaused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient\n\tat org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1566)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:92)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:138)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:110)\n\tat org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3556)\n\tat org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3588)\n\tat org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:550)\n\t... 8 more\nCaused by: java.lang.reflect.InvocationTargetException\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)\n\tat sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)\n\tat java.lang.reflect.Constructor.newInstance(Constructor.java:423)\n\tat org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1564)\n\t... 14 more\nCaused by: MetaException(message:Could not connect to meta store using any of the URIs provided. Most recent failure: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused (Connection refused)\n\tat org.apache.thrift.transport.TSocket.open(TSocket.java:226)\n\tat org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:487)\n\tat org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:282)\n\tat org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:76)\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)\n\tat sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)\n\tat java.lang.reflect.Constructor.newInstance(Constructor.java:423)\n\tat org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1564)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:92)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:138)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:110)\n\tat org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3556)\n\tat org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3588)\n\tat org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:550)\n\tat org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)\n\tat org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.lang.reflect.Method.invoke(Method.java:498)\n\tat org.apache.hadoop.util.RunJar.run(RunJar.java:233)\n\tat org.apache.hadoop.util.RunJar.main(RunJar.java:148)\nCaused by: java.net.ConnectException: Connection refused (Connection refused)\n\tat java.net.PlainSocketImpl.socketConnect(Native Method)\n\tat java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)\n\tat java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)\n\tat java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)\n\tat java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)\n\tat java.net.Socket.connect(Socket.java:589)\n\tat org.apache.thrift.transport.TSocket.open(TSocket.java:221)\n\t... 22 more\n)\n\tat org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:534)\n\tat org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:282)\n\tat org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:76)\n\t... 19 more\n)'] ERROR 2018-10-05 09:19:48,526 script_alert.py:123 - [Alert][oozie_server_status] Failed with result CRITICAL: ["Execution of 'source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status' returned 255. Connection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 1 sec. Retry count = 1\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 2 sec. Retry count = 2\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 4 sec. Retry count = 3\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 8 sec. Retry count = 4\nError: IO_ERROR : java.io.IOException: Error while connecting Oozie server. No of retries = 4. Exception = Connection refused"] ERROR 2018-10-05 09:19:48,526 script_alert.py:123 - [Alert][oozie_server_status] Failed with result CRITICAL: ["Execution of 'source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status' returned 255. Connection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 1 sec. Retry count = 1\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 2 sec. Retry count = 2\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 4 sec. Retry count = 3\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 8 sec. Retry count = 4\nError: IO_ERROR : java.io.IOException: Error while connecting Oozie server. No of retries = 4. Exception = Connection refused"] INFO 2018-10-05 09:19:50,567 Controller.py:304 - Heartbeat (response id = 792) with server is running... INFO 2018-10-05 09:19:50,567 Controller.py:311 - Building heartbeat message INFO 2018-10-05 09:19:50,572 Heartbeat.py:87 - Adding host info/state to heartbeat message. INFO 2018-10-05 09:19:50,699 logger.py:75 - Testing the JVM's JCE policy to see it if supports an unlimited key length. INFO 2018-10-05 09:19:50,699 logger.py:75 - Testing the JVM's JCE policy to see it if supports an unlimited key length. INFO 2018-10-05 09:19:50,897 Hardware.py:188 - Some mount points were ignored: /dev/shm, /run, /sys/fs/cgroup, /run/user/42, /run/user/0 INFO 2018-10-05 09:19:50,898 Controller.py:320 - Sending Heartbeat (id = 792) INFO 2018-10-05 09:19:50,900 Controller.py:333 - Heartbeat response received (id = 793) INFO 2018-10-05 09:19:50,900 Controller.py:342 - Heartbeat interval is 1 seconds INFO 2018-10-05 09:19:50,901 Controller.py:380 - Updating configurations from heartbeat INFO 2018-10-05 09:19:50,901 Controller.py:389 - Adding cancel/execution commands INFO 2018-10-05 09:19:50,901 Controller.py:475 - Waiting 0.9 for next heartbeat INFO 2018-10-05 09:19:51,802 Controller.py:482 - Wait for next heartbeat over ERROR 2018-10-05 09:20:32,465 script_alert.py:123 - [Alert][hive_webhcat_server_status] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:50111/templeton/v1/status?user.name=ambari-qa + \nTraceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_webhcat_server.py", line 190, in execute\n url_response = urllib2.urlopen(query_url, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n'] ERROR 2018-10-05 09:20:32,465 script_alert.py:123 - [Alert][hive_webhcat_server_status] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:50111/templeton/v1/status?user.name=ambari-qa + \nTraceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_webhcat_server.py", line 190, in execute\n url_response = urllib2.urlopen(query_url, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n'] ERROR 2018-10-05 09:20:32,468 script_alert.py:123 - [Alert][yarn_nodemanager_health] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:8042/ws/v1/node/info (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 171, in execute\n url_response = urllib2.urlopen(query, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n)'] ERROR 2018-10-05 09:20:32,468 script_alert.py:123 - [Alert][yarn_nodemanager_health] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:8042/ws/v1/node/info (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 171, in execute\n url_response = urllib2.urlopen(query, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n)'] WARNING 2018-10-05 09:20:32,494 base_alert.py:138 - [Alert][zeppelin_server_status] Unable to execute alert. 'NoneType' object has no attribute '__getitem__' WARNING 2018-10-05 09:20:32,499 base_alert.py:138 - [Alert][datanode_health_summary] Unable to execute alert. [Alert][datanode_health_summary] Unable to extract JSON from JMX response WARNING 2018-10-05 09:20:32,519 base_alert.py:138 - [Alert][namenode_directory_status] Unable to execute alert. [Alert][namenode_directory_status] Unable to extract JSON from JMX response INFO 2018-10-05 09:20:32,566 logger.py:75 - Pid file /var/run/ambari-metrics-monitor/ambari-metrics-monitor.pid is empty or does not exist INFO 2018-10-05 09:20:32,566 logger.py:75 - Pid file /var/run/ambari-metrics-monitor/ambari-metrics-monitor.pid is empty or does not exist ERROR 2018-10-05 09:20:32,566 script_alert.py:123 - [Alert][ams_metrics_monitor_process] Failed with result CRITICAL: ['Ambari Monitor is NOT running on master.hadoop.com'] ERROR 2018-10-05 09:20:32,566 script_alert.py:123 - [Alert][ams_metrics_monitor_process] Failed with result CRITICAL: ['Ambari Monitor is NOT running on master.hadoop.com'] INFO 2018-10-05 09:20:32,582 logger.py:75 - Execute['source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status'] {'environment': None, 'user': 'oozie'} INFO 2018-10-05 09:20:32,582 logger.py:75 - Execute['source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status'] {'environment': None, 'user': 'oozie'} ERROR 2018-10-05 09:20:48,534 script_alert.py:123 - [Alert][oozie_server_status] Failed with result CRITICAL: ["Execution of 'source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status' returned 255. Connection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 1 sec. Retry count = 1\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 2 sec. Retry count = 2\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 4 sec. Retry count = 3\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 8 sec. Retry count = 4\nError: IO_ERROR : java.io.IOException: Error while connecting Oozie server. No of retries = 4. Exception = Connection refused"] ERROR 2018-10-05 09:20:48,534 script_alert.py:123 - [Alert][oozie_server_status] Failed with result CRITICAL: ["Execution of 'source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status' returned 255. Connection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 1 sec. Retry count = 1\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 2 sec. Retry count = 2\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 4 sec. Retry count = 3\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 8 sec. Retry count = 4\nError: IO_ERROR : java.io.IOException: Error while connecting Oozie server. No of retries = 4. Exception = Connection refused"] INFO 2018-10-05 09:20:50,968 Controller.py:304 - Heartbeat (response id = 858) with server is running... INFO 2018-10-05 09:20:50,969 Controller.py:311 - Building heartbeat message INFO 2018-10-05 09:20:50,973 Heartbeat.py:87 - Adding host info/state to heartbeat message. INFO 2018-10-05 09:20:51,121 logger.py:75 - Testing the JVM's JCE policy to see it if supports an unlimited key length. INFO 2018-10-05 09:20:51,121 logger.py:75 - Testing the JVM's JCE policy to see it if supports an unlimited key length. INFO 2018-10-05 09:20:51,383 Hardware.py:188 - Some mount points were ignored: /dev/shm, /run, /sys/fs/cgroup, /run/user/42, /run/user/0 INFO 2018-10-05 09:20:51,384 Controller.py:320 - Sending Heartbeat (id = 858) INFO 2018-10-05 09:20:51,385 Controller.py:333 - Heartbeat response received (id = 859) INFO 2018-10-05 09:20:51,385 Controller.py:342 - Heartbeat interval is 1 seconds INFO 2018-10-05 09:20:51,385 Controller.py:380 - Updating configurations from heartbeat INFO 2018-10-05 09:20:51,386 Controller.py:389 - Adding cancel/execution commands INFO 2018-10-05 09:20:51,386 Controller.py:475 - Waiting 0.9 for next heartbeat INFO 2018-10-05 09:20:52,286 Controller.py:482 - Wait for next heartbeat over ERROR 2018-10-05 09:21:32,467 script_alert.py:123 - [Alert][hive_webhcat_server_status] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:50111/templeton/v1/status?user.name=ambari-qa + \nTraceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_webhcat_server.py", line 190, in execute\n url_response = urllib2.urlopen(query_url, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n'] ERROR 2018-10-05 09:21:32,469 script_alert.py:123 - [Alert][yarn_nodemanager_health] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:8042/ws/v1/node/info (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 171, in execute\n url_response = urllib2.urlopen(query, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n)'] ERROR 2018-10-05 09:21:32,467 script_alert.py:123 - [Alert][hive_webhcat_server_status] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:50111/templeton/v1/status?user.name=ambari-qa + \nTraceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_webhcat_server.py", line 190, in execute\n url_response = urllib2.urlopen(query_url, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n'] ERROR 2018-10-05 09:21:32,469 script_alert.py:123 - [Alert][yarn_nodemanager_health] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:8042/ws/v1/node/info (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 171, in execute\n url_response = urllib2.urlopen(query, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n)'] WARNING 2018-10-05 09:21:32,491 base_alert.py:138 - [Alert][zeppelin_server_status] Unable to execute alert. 'NoneType' object has no attribute '__getitem__' WARNING 2018-10-05 09:21:32,498 base_alert.py:138 - [Alert][namenode_hdfs_pending_deletion_blocks] Unable to execute alert. [Alert][namenode_hdfs_pending_deletion_blocks] Unable to extract JSON from JMX response ERROR 2018-10-05 09:21:32,501 script_alert.py:123 - [Alert][datanode_unmounted_data_dir] Failed with result CRITICAL: ['The following data dir(s) were not found: /hadoop/hdfs/data\n/root/hadoop/hdfs/data\n'] ERROR 2018-10-05 09:21:32,501 script_alert.py:123 - [Alert][datanode_unmounted_data_dir] Failed with result CRITICAL: ['The following data dir(s) were not found: /hadoop/hdfs/data\n/root/hadoop/hdfs/data\n'] WARNING 2018-10-05 09:21:32,507 base_alert.py:138 - [Alert][datanode_heap_usage] Unable to execute alert. [Alert][datanode_heap_usage] Unable to extract JSON from JMX response WARNING 2018-10-05 09:21:32,509 base_alert.py:138 - [Alert][datanode_health_summary] Unable to execute alert. [Alert][datanode_health_summary] Unable to extract JSON from JMX response WARNING 2018-10-05 09:21:32,510 base_alert.py:138 - [Alert][namenode_hdfs_blocks_health] Unable to execute alert. [Alert][namenode_hdfs_blocks_health] Unable to extract JSON from JMX response WARNING 2018-10-05 09:21:32,519 base_alert.py:138 - [Alert][datanode_storage] Unable to execute alert. [Alert][datanode_storage] Unable to extract JSON from JMX response WARNING 2018-10-05 09:21:32,524 base_alert.py:138 - [Alert][namenode_directory_status] Unable to execute alert. [Alert][namenode_directory_status] Unable to extract JSON from JMX response WARNING 2018-10-05 09:21:32,527 base_alert.py:138 - [Alert][namenode_hdfs_capacity_utilization] Unable to execute alert. [Alert][namenode_hdfs_capacity_utilization] Unable to extract JSON from JMX response WARNING 2018-10-05 09:21:32,527 base_alert.py:138 - [Alert][namenode_rpc_latency] Unable to execute alert. [Alert][namenode_rpc_latency] Unable to extract JSON from JMX response INFO 2018-10-05 09:21:32,558 logger.py:75 - Pid file /var/run/ambari-metrics-monitor/ambari-metrics-monitor.pid is empty or does not exist INFO 2018-10-05 09:21:32,558 logger.py:75 - Pid file /var/run/ambari-metrics-monitor/ambari-metrics-monitor.pid is empty or does not exist ERROR 2018-10-05 09:21:32,559 script_alert.py:123 - [Alert][ams_metrics_monitor_process] Failed with result CRITICAL: ['Ambari Monitor is NOT running on master.hadoop.com'] ERROR 2018-10-05 09:21:32,559 script_alert.py:123 - [Alert][ams_metrics_monitor_process] Failed with result CRITICAL: ['Ambari Monitor is NOT running on master.hadoop.com'] INFO 2018-10-05 09:21:32,581 logger.py:75 - Execute['source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status'] {'environment': None, 'user': 'oozie'} INFO 2018-10-05 09:21:32,581 logger.py:75 - Execute['source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status'] {'environment': None, 'user': 'oozie'} ERROR 2018-10-05 09:21:48,508 script_alert.py:123 - [Alert][oozie_server_status] Failed with result CRITICAL: ["Execution of 'source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status' returned 255. Connection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 1 sec. Retry count = 1\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 2 sec. Retry count = 2\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 4 sec. Retry count = 3\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 8 sec. Retry count = 4\nError: IO_ERROR : java.io.IOException: Error while connecting Oozie server. No of retries = 4. Exception = Connection refused"] ERROR 2018-10-05 09:21:48,508 script_alert.py:123 - [Alert][oozie_server_status] Failed with result CRITICAL: ["Execution of 'source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status' returned 255. Connection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 1 sec. Retry count = 1\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 2 sec. Retry count = 2\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 4 sec. Retry count = 3\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 8 sec. Retry count = 4\nError: IO_ERROR : java.io.IOException: Error while connecting Oozie server. No of retries = 4. Exception = Connection refused"] INFO 2018-10-05 09:21:51,434 Controller.py:304 - Heartbeat (response id = 924) with server is running... INFO 2018-10-05 09:21:51,435 Controller.py:311 - Building heartbeat message INFO 2018-10-05 09:21:51,439 Heartbeat.py:87 - Adding host info/state to heartbeat message. INFO 2018-10-05 09:21:51,607 logger.py:75 - Testing the JVM's JCE policy to see it if supports an unlimited key length. INFO 2018-10-05 09:21:51,607 logger.py:75 - Testing the JVM's JCE policy to see it if supports an unlimited key length. INFO 2018-10-05 09:21:51,855 Hardware.py:188 - Some mount points were ignored: /dev/shm, /run, /sys/fs/cgroup, /run/user/42, /run/user/0 INFO 2018-10-05 09:21:51,857 Controller.py:320 - Sending Heartbeat (id = 924) INFO 2018-10-05 09:21:51,859 Controller.py:333 - Heartbeat response received (id = 925) INFO 2018-10-05 09:21:51,859 Controller.py:342 - Heartbeat interval is 1 seconds INFO 2018-10-05 09:21:51,859 Controller.py:380 - Updating configurations from heartbeat INFO 2018-10-05 09:21:51,860 Controller.py:389 - Adding cancel/execution commands INFO 2018-10-05 09:21:51,860 Controller.py:475 - Waiting 0.9 for next heartbeat INFO 2018-10-05 09:21:52,760 Controller.py:482 - Wait for next heartbeat over WARNING 2018-10-05 09:22:32,460 base_alert.py:138 - [Alert][hbase_master_cpu] Unable to execute alert. [Alert][hbase_master_cpu] Unable to extract JSON from JMX response ERROR 2018-10-05 09:22:32,477 script_alert.py:123 - [Alert][hive_webhcat_server_status] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:50111/templeton/v1/status?user.name=ambari-qa + \nTraceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_webhcat_server.py", line 190, in execute\n url_response = urllib2.urlopen(query_url, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n'] ERROR 2018-10-05 09:22:32,477 script_alert.py:123 - [Alert][hive_webhcat_server_status] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:50111/templeton/v1/status?user.name=ambari-qa + \nTraceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_webhcat_server.py", line 190, in execute\n url_response = urllib2.urlopen(query_url, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n'] WARNING 2018-10-05 09:22:32,489 base_alert.py:138 - [Alert][yarn_resourcemanager_cpu] Unable to execute alert. [Alert][yarn_resourcemanager_cpu] Unable to extract JSON from JMX response ERROR 2018-10-05 09:22:32,490 script_alert.py:123 - [Alert][yarn_nodemanager_health] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:8042/ws/v1/node/info (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 171, in execute\n url_response = urllib2.urlopen(query, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n)'] ERROR 2018-10-05 09:22:32,490 script_alert.py:123 - [Alert][yarn_nodemanager_health] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:8042/ws/v1/node/info (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 171, in execute\n url_response = urllib2.urlopen(query, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n)'] WARNING 2018-10-05 09:22:32,494 base_alert.py:138 - [Alert][yarn_resourcemanager_rpc_latency] Unable to execute alert. [Alert][yarn_resourcemanager_rpc_latency] Unable to extract JSON from JMX response WARNING 2018-10-05 09:22:32,512 base_alert.py:138 - [Alert][ams_metrics_collector_hbase_master_cpu] Unable to execute alert. [Alert][ams_metrics_collector_hbase_master_cpu] Unable to extract JSON from JMX response WARNING 2018-10-05 09:22:32,515 base_alert.py:138 - [Alert][namenode_cpu] Unable to execute alert. [Alert][namenode_cpu] Unable to extract JSON from JMX response WARNING 2018-10-05 09:22:32,516 base_alert.py:138 - [Alert][zeppelin_server_status] Unable to execute alert. 'NoneType' object has no attribute '__getitem__' WARNING 2018-10-05 09:22:32,524 base_alert.py:138 - [Alert][datanode_health_summary] Unable to execute alert. [Alert][datanode_health_summary] Unable to extract JSON from JMX response WARNING 2018-10-05 09:22:32,541 base_alert.py:138 - [Alert][namenode_service_rpc_processing_latency_hourly] Unable to execute alert. Couldn't define hadoop_conf_dir: argument of type 'NoneType' is not iterable WARNING 2018-10-05 09:22:32,546 base_alert.py:138 - [Alert][namenode_client_rpc_queue_latency_hourly] Unable to execute alert. Couldn't define hadoop_conf_dir: argument of type 'NoneType' is not iterable WARNING 2018-10-05 09:22:32,549 base_alert.py:138 - [Alert][namenode_client_rpc_processing_latency_hourly] Unable to execute alert. Couldn't define hadoop_conf_dir: argument of type 'NoneType' is not iterable WARNING 2018-10-05 09:22:32,552 base_alert.py:138 - [Alert][namenode_directory_status] Unable to execute alert. [Alert][namenode_directory_status] Unable to extract JSON from JMX response WARNING 2018-10-05 09:22:32,553 base_alert.py:138 - [Alert][namenode_service_rpc_queue_latency_hourly] Unable to execute alert. Couldn't define hadoop_conf_dir: argument of type 'NoneType' is not iterable WARNING 2018-10-05 09:22:32,586 base_alert.py:138 - [Alert][mapreduce_history_server_rpc_latency] Unable to execute alert. [Alert][mapreduce_history_server_rpc_latency] Unable to extract JSON from JMX response WARNING 2018-10-05 09:22:32,588 base_alert.py:138 - [Alert][mapreduce_history_server_cpu] Unable to execute alert. [Alert][mapreduce_history_server_cpu] Unable to extract JSON from JMX response INFO 2018-10-05 09:22:32,595 logger.py:75 - Execute['! beeline -u 'jdbc:hive2://master.hadoop.com:10000/;transportMode=binary' -e '' 2>&1| awk '{print}'|grep -i -e 'Connection refused' -e 'Invalid URL''] {'path': ['/bin/', '/usr/bin/', '/usr/lib/hive/bin/', '/usr/sbin/'], 'timeout_kill_strategy': 2, 'timeout': 60, 'user': 'ambari-qa'} INFO 2018-10-05 09:22:32,595 logger.py:75 - Execute['! beeline -u 'jdbc:hive2://master.hadoop.com:10000/;transportMode=binary' -e '' 2>&1| awk '{print}'|grep -i -e 'Connection refused' -e 'Invalid URL''] {'path': ['/bin/', '/usr/bin/', '/usr/lib/hive/bin/', '/usr/sbin/'], 'timeout_kill_strategy': 2, 'timeout': 60, 'user': 'ambari-qa'} WARNING 2018-10-05 09:22:32,605 logger.py:71 - Cannot find the stack name in the command. Stack tools cannot be loaded WARNING 2018-10-05 09:22:32,605 logger.py:71 - Cannot find the stack name in the command. Stack tools cannot be loaded INFO 2018-10-05 09:22:32,606 logger.py:75 - call[('ambari-python-wrap', None, 'versions')] {} INFO 2018-10-05 09:22:32,606 logger.py:75 - call[('ambari-python-wrap', None, 'versions')] {} INFO 2018-10-05 09:22:32,619 logger.py:75 - Pid file /var/run/ambari-metrics-monitor/ambari-metrics-monitor.pid is empty or does not exist INFO 2018-10-05 09:22:32,619 logger.py:75 - Pid file /var/run/ambari-metrics-monitor/ambari-metrics-monitor.pid is empty or does not exist ERROR 2018-10-05 09:22:32,620 script_alert.py:123 - [Alert][ams_metrics_monitor_process] Failed with result CRITICAL: ['Ambari Monitor is NOT running on master.hadoop.com'] ERROR 2018-10-05 09:22:32,620 script_alert.py:123 - [Alert][ams_metrics_monitor_process] Failed with result CRITICAL: ['Ambari Monitor is NOT running on master.hadoop.com'] INFO 2018-10-05 09:22:32,631 logger.py:75 - Execute['export HIVE_CONF_DIR='/usr/hdp/current/hive-metastore/conf' ; hive --hiveconf hive.metastore.uris=thrift://master.hadoop.com:9083 --hiveconf hive.metastore.client.connect.retry.delay=1 --hiveconf hive.metastore.failure.retries=1 --hiveconf hive.metastore.connect.retries=1 --hiveconf hive.metastore.client.socket.timeout=14 --hiveconf hive.execution.engine=mr -e 'show databases;''] {'path': ['/bin/', '/usr/bin/', '/usr/sbin/', u'/usr/hdp/current/hive-metastore/bin'], 'timeout_kill_strategy': 2, 'timeout': 60, 'user': 'ambari-qa'} INFO 2018-10-05 09:22:32,631 logger.py:75 - call returned (1, "/bin/ambari-python-wrap: can't find '__main__' module in ''") INFO 2018-10-05 09:22:32,631 logger.py:75 - Execute['export HIVE_CONF_DIR='/usr/hdp/current/hive-metastore/conf' ; hive --hiveconf hive.metastore.uris=thrift://master.hadoop.com:9083 --hiveconf hive.metastore.client.connect.retry.delay=1 --hiveconf hive.metastore.failure.retries=1 --hiveconf hive.metastore.connect.retries=1 --hiveconf hive.metastore.client.socket.timeout=14 --hiveconf hive.execution.engine=mr -e 'show databases;''] {'path': ['/bin/', '/usr/bin/', '/usr/sbin/', u'/usr/hdp/current/hive-metastore/bin'], 'timeout_kill_strategy': 2, 'timeout': 60, 'user': 'ambari-qa'} INFO 2018-10-05 09:22:32,631 logger.py:75 - call returned (1, "/bin/ambari-python-wrap: can't find '__main__' module in ''") ERROR 2018-10-05 09:22:32,632 script_alert.py:123 - [Alert][ambari_agent_version_select] Failed with result CRITICAL: ["hdp-select could not properly read /usr/hdp. Check this directory for unexpected contents.\n/bin/ambari-python-wrap: can't find '__main__' module in ''"] ERROR 2018-10-05 09:22:32,632 script_alert.py:123 - [Alert][ambari_agent_version_select] Failed with result CRITICAL: ["hdp-select could not properly read /usr/hdp. Check this directory for unexpected contents.\n/bin/ambari-python-wrap: can't find '__main__' module in ''"] INFO 2018-10-05 09:22:32,638 logger.py:75 - Execute['source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status'] {'environment': None, 'user': 'oozie'} INFO 2018-10-05 09:22:32,638 logger.py:75 - Execute['source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status'] {'environment': None, 'user': 'oozie'} ERROR 2018-10-05 09:22:36,670 script_alert.py:123 - [Alert][hive_server_process] Failed with result CRITICAL: ['Connection failed on host master.hadoop.com:10000 (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_hive_thrift_port.py", line 212, in execute\n ldap_password=ldap_password)\n File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/hive_check.py", line 81, in check_thrift_port_sasl\n timeout_kill_strategy=TerminateStrategy.KILL_PROCESS_TREE,\n File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 166, in __init__\n self.env.run()\n File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run\n self.run_action(resource, action)\n File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action\n provider_action()\n File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 262, in action_run\n tries=self.resource.tries, try_sleep=self.resource.try_sleep)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner\n result = function(command, **kwargs)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call\n tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper\n result = _call(command, **kwargs_copy)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call\n raise ExecutionFailed(err_msg, code, out, err)\nExecutionFailed: Execution of \'! beeline -u \'jdbc:hive2://master.hadoop.com:10000/;transportMode=binary\' -e \'\' 2>&1| awk \'{print}\'|grep -i -e \'Connection refused\' -e \'Invalid URL\'\' returned 1. Error: Could not open client transport with JDBC Uri: jdbc:hive2://master.hadoop.com:10000/;transportMode=binary: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0)\nError: Could not open client transport with JDBC Uri: jdbc:hive2://master.hadoop.com:10000/;transportMode=binary: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0)\n)'] ERROR 2018-10-05 09:22:36,670 script_alert.py:123 - [Alert][hive_server_process] Failed with result CRITICAL: ['Connection failed on host master.hadoop.com:10000 (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_hive_thrift_port.py", line 212, in execute\n ldap_password=ldap_password)\n File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/hive_check.py", line 81, in check_thrift_port_sasl\n timeout_kill_strategy=TerminateStrategy.KILL_PROCESS_TREE,\n File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 166, in __init__\n self.env.run()\n File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run\n self.run_action(resource, action)\n File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action\n provider_action()\n File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 262, in action_run\n tries=self.resource.tries, try_sleep=self.resource.try_sleep)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner\n result = function(command, **kwargs)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call\n tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper\n result = _call(command, **kwargs_copy)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call\n raise ExecutionFailed(err_msg, code, out, err)\nExecutionFailed: Execution of \'! beeline -u \'jdbc:hive2://master.hadoop.com:10000/;transportMode=binary\' -e \'\' 2>&1| awk \'{print}\'|grep -i -e \'Connection refused\' -e \'Invalid URL\'\' returned 1. Error: Could not open client transport with JDBC Uri: jdbc:hive2://master.hadoop.com:10000/;transportMode=binary: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0)\nError: Could not open client transport with JDBC Uri: jdbc:hive2://master.hadoop.com:10000/;transportMode=binary: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0)\n)'] ERROR 2018-10-05 09:22:39,641 script_alert.py:123 - [Alert][hive_metastore_process] Failed with result CRITICAL: ['Metastore on master.hadoop.com failed (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_hive_metastore.py", line 203, in execute\n timeout_kill_strategy=TerminateStrategy.KILL_PROCESS_TREE,\n File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 166, in __init__\n self.env.run()\n File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run\n self.run_action(resource, action)\n File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action\n provider_action()\n File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 262, in action_run\n tries=self.resource.tries, try_sleep=self.resource.try_sleep)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner\n result = function(command, **kwargs)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call\n tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper\n result = _call(command, **kwargs_copy)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call\n raise ExecutionFailed(err_msg, code, out, err)\nExecutionFailed: Execution of \'export HIVE_CONF_DIR=\'/usr/hdp/current/hive-metastore/conf\' ; hive --hiveconf hive.metastore.uris=thrift://master.hadoop.com:9083 --hiveconf hive.metastore.client.connect.retry.delay=1 --hiveconf hive.metastore.failure.retries=1 --hiveconf hive.metastore.connect.retries=1 --hiveconf hive.metastore.client.socket.timeout=14 --hiveconf hive.execution.engine=mr -e \'show databases;\'\' returned 1. log4j:WARN No such property [maxFileSize] in org.apache.log4j.DailyRollingFileAppender.\n\nLogging initialized using configuration in file:/etc/hive/2.6.5.0-292/0/hive-log4j.properties\nException in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient\n\tat org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:569)\n\tat org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)\n\tat org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.lang.reflect.Method.invoke(Method.java:498)\n\tat org.apache.hadoop.util.RunJar.run(RunJar.java:233)\n\tat org.apache.hadoop.util.RunJar.main(RunJar.java:148)\nCaused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient\n\tat org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1566)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:92)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:138)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:110)\n\tat org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3556)\n\tat org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3588)\n\tat org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:550)\n\t... 8 more\nCaused by: java.lang.reflect.InvocationTargetException\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)\n\tat sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)\n\tat java.lang.reflect.Constructor.newInstance(Constructor.java:423)\n\tat org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1564)\n\t... 14 more\nCaused by: MetaException(message:Could not connect to meta store using any of the URIs provided. Most recent failure: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused (Connection refused)\n\tat org.apache.thrift.transport.TSocket.open(TSocket.java:226)\n\tat org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:487)\n\tat org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:282)\n\tat org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:76)\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)\n\tat sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)\n\tat java.lang.reflect.Constructor.newInstance(Constructor.java:423)\n\tat org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1564)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:92)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:138)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:110)\n\tat org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3556)\n\tat org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3588)\n\tat org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:550)\n\tat org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)\n\tat org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.lang.reflect.Method.invoke(Method.java:498)\n\tat org.apache.hadoop.util.RunJar.run(RunJar.java:233)\n\tat org.apache.hadoop.util.RunJar.main(RunJar.java:148)\nCaused by: java.net.ConnectException: Connection refused (Connection refused)\n\tat java.net.PlainSocketImpl.socketConnect(Native Method)\n\tat java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)\n\tat java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)\n\tat java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)\n\tat java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)\n\tat java.net.Socket.connect(Socket.java:589)\n\tat org.apache.thrift.transport.TSocket.open(TSocket.java:221)\n\t... 22 more\n)\n\tat org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:534)\n\tat org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:282)\n\tat org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:76)\n\t... 19 more\n)'] ERROR 2018-10-05 09:22:39,641 script_alert.py:123 - [Alert][hive_metastore_process] Failed with result CRITICAL: ['Metastore on master.hadoop.com failed (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_hive_metastore.py", line 203, in execute\n timeout_kill_strategy=TerminateStrategy.KILL_PROCESS_TREE,\n File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 166, in __init__\n self.env.run()\n File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run\n self.run_action(resource, action)\n File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action\n provider_action()\n File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 262, in action_run\n tries=self.resource.tries, try_sleep=self.resource.try_sleep)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner\n result = function(command, **kwargs)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call\n tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper\n result = _call(command, **kwargs_copy)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call\n raise ExecutionFailed(err_msg, code, out, err)\nExecutionFailed: Execution of \'export HIVE_CONF_DIR=\'/usr/hdp/current/hive-metastore/conf\' ; hive --hiveconf hive.metastore.uris=thrift://master.hadoop.com:9083 --hiveconf hive.metastore.client.connect.retry.delay=1 --hiveconf hive.metastore.failure.retries=1 --hiveconf hive.metastore.connect.retries=1 --hiveconf hive.metastore.client.socket.timeout=14 --hiveconf hive.execution.engine=mr -e \'show databases;\'\' returned 1. log4j:WARN No such property [maxFileSize] in org.apache.log4j.DailyRollingFileAppender.\n\nLogging initialized using configuration in file:/etc/hive/2.6.5.0-292/0/hive-log4j.properties\nException in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient\n\tat org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:569)\n\tat org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)\n\tat org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.lang.reflect.Method.invoke(Method.java:498)\n\tat org.apache.hadoop.util.RunJar.run(RunJar.java:233)\n\tat org.apache.hadoop.util.RunJar.main(RunJar.java:148)\nCaused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient\n\tat org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1566)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:92)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:138)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:110)\n\tat org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3556)\n\tat org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3588)\n\tat org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:550)\n\t... 8 more\nCaused by: java.lang.reflect.InvocationTargetException\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)\n\tat sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)\n\tat java.lang.reflect.Constructor.newInstance(Constructor.java:423)\n\tat org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1564)\n\t... 14 more\nCaused by: MetaException(message:Could not connect to meta store using any of the URIs provided. Most recent failure: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused (Connection refused)\n\tat org.apache.thrift.transport.TSocket.open(TSocket.java:226)\n\tat org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:487)\n\tat org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:282)\n\tat org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:76)\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)\n\tat sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)\n\tat java.lang.reflect.Constructor.newInstance(Constructor.java:423)\n\tat org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1564)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:92)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:138)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:110)\n\tat org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3556)\n\tat org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3588)\n\tat org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:550)\n\tat org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)\n\tat org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.lang.reflect.Method.invoke(Method.java:498)\n\tat org.apache.hadoop.util.RunJar.run(RunJar.java:233)\n\tat org.apache.hadoop.util.RunJar.main(RunJar.java:148)\nCaused by: java.net.ConnectException: Connection refused (Connection refused)\n\tat java.net.PlainSocketImpl.socketConnect(Native Method)\n\tat java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)\n\tat java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)\n\tat java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)\n\tat java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)\n\tat java.net.Socket.connect(Socket.java:589)\n\tat org.apache.thrift.transport.TSocket.open(TSocket.java:221)\n\t... 22 more\n)\n\tat org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:534)\n\tat org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:282)\n\tat org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:76)\n\t... 19 more\n)'] ERROR 2018-10-05 09:22:48,556 script_alert.py:123 - [Alert][oozie_server_status] Failed with result CRITICAL: ["Execution of 'source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status' returned 255. Connection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 1 sec. Retry count = 1\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 2 sec. Retry count = 2\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 4 sec. Retry count = 3\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 8 sec. Retry count = 4\nError: IO_ERROR : java.io.IOException: Error while connecting Oozie server. No of retries = 4. Exception = Connection refused"] ERROR 2018-10-05 09:22:48,556 script_alert.py:123 - [Alert][oozie_server_status] Failed with result CRITICAL: ["Execution of 'source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status' returned 255. Connection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 1 sec. Retry count = 1\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 2 sec. Retry count = 2\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 4 sec. Retry count = 3\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 8 sec. Retry count = 4\nError: IO_ERROR : java.io.IOException: Error while connecting Oozie server. No of retries = 4. Exception = Connection refused"] INFO 2018-10-05 09:22:51,915 Controller.py:304 - Heartbeat (response id = 990) with server is running... INFO 2018-10-05 09:22:51,916 Controller.py:311 - Building heartbeat message INFO 2018-10-05 09:22:51,920 Heartbeat.py:87 - Adding host info/state to heartbeat message. INFO 2018-10-05 09:22:52,052 logger.py:75 - Testing the JVM's JCE policy to see it if supports an unlimited key length. INFO 2018-10-05 09:22:52,052 logger.py:75 - Testing the JVM's JCE policy to see it if supports an unlimited key length. INFO 2018-10-05 09:22:52,239 Hardware.py:188 - Some mount points were ignored: /dev/shm, /run, /sys/fs/cgroup, /run/user/42, /run/user/0 INFO 2018-10-05 09:22:52,240 Controller.py:320 - Sending Heartbeat (id = 990) INFO 2018-10-05 09:22:52,241 Controller.py:333 - Heartbeat response received (id = 991) INFO 2018-10-05 09:22:52,241 Controller.py:342 - Heartbeat interval is 1 seconds INFO 2018-10-05 09:22:52,242 Controller.py:380 - Updating configurations from heartbeat INFO 2018-10-05 09:22:52,242 Controller.py:389 - Adding cancel/execution commands INFO 2018-10-05 09:22:52,242 Controller.py:475 - Waiting 0.9 for next heartbeat INFO 2018-10-05 09:22:53,142 Controller.py:482 - Wait for next heartbeat over ERROR 2018-10-05 09:23:32,480 script_alert.py:123 - [Alert][hive_webhcat_server_status] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:50111/templeton/v1/status?user.name=ambari-qa + \nTraceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_webhcat_server.py", line 190, in execute\n url_response = urllib2.urlopen(query_url, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n'] ERROR 2018-10-05 09:23:32,480 script_alert.py:123 - [Alert][hive_webhcat_server_status] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:50111/templeton/v1/status?user.name=ambari-qa + \nTraceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_webhcat_server.py", line 190, in execute\n url_response = urllib2.urlopen(query_url, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n'] ERROR 2018-10-05 09:23:32,495 script_alert.py:123 - [Alert][yarn_nodemanager_health] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:8042/ws/v1/node/info (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 171, in execute\n url_response = urllib2.urlopen(query, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n)'] ERROR 2018-10-05 09:23:32,495 script_alert.py:123 - [Alert][yarn_nodemanager_health] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:8042/ws/v1/node/info (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 171, in execute\n url_response = urllib2.urlopen(query, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n)'] WARNING 2018-10-05 09:23:32,516 base_alert.py:138 - [Alert][zeppelin_server_status] Unable to execute alert. 'NoneType' object has no attribute '__getitem__' WARNING 2018-10-05 09:23:32,517 base_alert.py:138 - [Alert][namenode_hdfs_pending_deletion_blocks] Unable to execute alert. [Alert][namenode_hdfs_pending_deletion_blocks] Unable to extract JSON from JMX response WARNING 2018-10-05 09:23:32,525 base_alert.py:138 - [Alert][datanode_heap_usage] Unable to execute alert. [Alert][datanode_heap_usage] Unable to extract JSON from JMX response ERROR 2018-10-05 09:23:32,528 script_alert.py:123 - [Alert][datanode_unmounted_data_dir] Failed with result CRITICAL: ['The following data dir(s) were not found: /hadoop/hdfs/data\n/root/hadoop/hdfs/data\n'] ERROR 2018-10-05 09:23:32,528 script_alert.py:123 - [Alert][datanode_unmounted_data_dir] Failed with result CRITICAL: ['The following data dir(s) were not found: /hadoop/hdfs/data\n/root/hadoop/hdfs/data\n'] WARNING 2018-10-05 09:23:32,529 base_alert.py:138 - [Alert][datanode_health_summary] Unable to execute alert. [Alert][datanode_health_summary] Unable to extract JSON from JMX response WARNING 2018-10-05 09:23:32,531 base_alert.py:138 - [Alert][namenode_hdfs_blocks_health] Unable to execute alert. [Alert][namenode_hdfs_blocks_health] Unable to extract JSON from JMX response WARNING 2018-10-05 09:23:32,543 base_alert.py:138 - [Alert][datanode_storage] Unable to execute alert. [Alert][datanode_storage] Unable to extract JSON from JMX response WARNING 2018-10-05 09:23:32,547 base_alert.py:138 - [Alert][namenode_directory_status] Unable to execute alert. [Alert][namenode_directory_status] Unable to extract JSON from JMX response WARNING 2018-10-05 09:23:32,551 base_alert.py:138 - [Alert][namenode_hdfs_capacity_utilization] Unable to execute alert. [Alert][namenode_hdfs_capacity_utilization] Unable to extract JSON from JMX response WARNING 2018-10-05 09:23:32,554 base_alert.py:138 - [Alert][namenode_rpc_latency] Unable to execute alert. [Alert][namenode_rpc_latency] Unable to extract JSON from JMX response INFO 2018-10-05 09:23:32,594 logger.py:75 - Pid file /var/run/ambari-metrics-monitor/ambari-metrics-monitor.pid is empty or does not exist INFO 2018-10-05 09:23:32,594 logger.py:75 - Pid file /var/run/ambari-metrics-monitor/ambari-metrics-monitor.pid is empty or does not exist ERROR 2018-10-05 09:23:32,595 script_alert.py:123 - [Alert][ams_metrics_monitor_process] Failed with result CRITICAL: ['Ambari Monitor is NOT running on master.hadoop.com'] ERROR 2018-10-05 09:23:32,595 script_alert.py:123 - [Alert][ams_metrics_monitor_process] Failed with result CRITICAL: ['Ambari Monitor is NOT running on master.hadoop.com'] INFO 2018-10-05 09:23:32,608 logger.py:75 - Execute['source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status'] {'environment': None, 'user': 'oozie'} INFO 2018-10-05 09:23:32,608 logger.py:75 - Execute['source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status'] {'environment': None, 'user': 'oozie'} ERROR 2018-10-05 09:23:48,509 script_alert.py:123 - [Alert][oozie_server_status] Failed with result CRITICAL: ["Execution of 'source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status' returned 255. Connection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 1 sec. Retry count = 1\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 2 sec. Retry count = 2\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 4 sec. Retry count = 3\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 8 sec. Retry count = 4\nError: IO_ERROR : java.io.IOException: Error while connecting Oozie server. No of retries = 4. Exception = Connection refused"] ERROR 2018-10-05 09:23:48,509 script_alert.py:123 - [Alert][oozie_server_status] Failed with result CRITICAL: ["Execution of 'source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status' returned 255. Connection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 1 sec. Retry count = 1\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 2 sec. Retry count = 2\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 4 sec. Retry count = 3\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 8 sec. Retry count = 4\nError: IO_ERROR : java.io.IOException: Error while connecting Oozie server. No of retries = 4. Exception = Connection refused"] INFO 2018-10-05 09:23:52,313 Controller.py:304 - Heartbeat (response id = 1056) with server is running... INFO 2018-10-05 09:23:52,313 Controller.py:311 - Building heartbeat message INFO 2018-10-05 09:23:52,318 Heartbeat.py:87 - Adding host info/state to heartbeat message. INFO 2018-10-05 09:23:52,478 logger.py:75 - Testing the JVM's JCE policy to see it if supports an unlimited key length. INFO 2018-10-05 09:23:52,478 logger.py:75 - Testing the JVM's JCE policy to see it if supports an unlimited key length. INFO 2018-10-05 09:23:52,725 Hardware.py:188 - Some mount points were ignored: /dev/shm, /run, /sys/fs/cgroup, /run/user/42, /run/user/0 INFO 2018-10-05 09:23:52,726 Controller.py:320 - Sending Heartbeat (id = 1056) INFO 2018-10-05 09:23:52,728 Controller.py:333 - Heartbeat response received (id = 1057) INFO 2018-10-05 09:23:52,728 Controller.py:342 - Heartbeat interval is 1 seconds INFO 2018-10-05 09:23:52,728 Controller.py:380 - Updating configurations from heartbeat INFO 2018-10-05 09:23:52,728 Controller.py:389 - Adding cancel/execution commands INFO 2018-10-05 09:23:52,728 Controller.py:475 - Waiting 0.9 for next heartbeat INFO 2018-10-05 09:23:53,629 Controller.py:482 - Wait for next heartbeat over ERROR 2018-10-05 09:24:32,462 script_alert.py:123 - [Alert][hive_webhcat_server_status] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:50111/templeton/v1/status?user.name=ambari-qa + \nTraceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_webhcat_server.py", line 190, in execute\n url_response = urllib2.urlopen(query_url, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n'] ERROR 2018-10-05 09:24:32,462 script_alert.py:123 - [Alert][hive_webhcat_server_status] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:50111/templeton/v1/status?user.name=ambari-qa + \nTraceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_webhcat_server.py", line 190, in execute\n url_response = urllib2.urlopen(query_url, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n'] ERROR 2018-10-05 09:24:32,473 script_alert.py:123 - [Alert][yarn_nodemanager_health] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:8042/ws/v1/node/info (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 171, in execute\n url_response = urllib2.urlopen(query, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n)'] ERROR 2018-10-05 09:24:32,473 script_alert.py:123 - [Alert][yarn_nodemanager_health] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:8042/ws/v1/node/info (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 171, in execute\n url_response = urllib2.urlopen(query, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n)'] WARNING 2018-10-05 09:24:32,500 base_alert.py:138 - [Alert][zeppelin_server_status] Unable to execute alert. 'NoneType' object has no attribute '__getitem__' WARNING 2018-10-05 09:24:32,507 base_alert.py:138 - [Alert][datanode_health_summary] Unable to execute alert. [Alert][datanode_health_summary] Unable to extract JSON from JMX response WARNING 2018-10-05 09:24:32,529 base_alert.py:138 - [Alert][namenode_directory_status] Unable to execute alert. [Alert][namenode_directory_status] Unable to extract JSON from JMX response INFO 2018-10-05 09:24:32,573 logger.py:75 - Pid file /var/run/ambari-metrics-monitor/ambari-metrics-monitor.pid is empty or does not exist INFO 2018-10-05 09:24:32,573 logger.py:75 - Pid file /var/run/ambari-metrics-monitor/ambari-metrics-monitor.pid is empty or does not exist ERROR 2018-10-05 09:24:32,574 script_alert.py:123 - [Alert][ams_metrics_monitor_process] Failed with result CRITICAL: ['Ambari Monitor is NOT running on master.hadoop.com'] ERROR 2018-10-05 09:24:32,574 script_alert.py:123 - [Alert][ams_metrics_monitor_process] Failed with result CRITICAL: ['Ambari Monitor is NOT running on master.hadoop.com'] INFO 2018-10-05 09:24:32,590 logger.py:75 - Execute['source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status'] {'environment': None, 'user': 'oozie'} INFO 2018-10-05 09:24:32,590 logger.py:75 - Execute['source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status'] {'environment': None, 'user': 'oozie'} ERROR 2018-10-05 09:24:48,524 script_alert.py:123 - [Alert][oozie_server_status] Failed with result CRITICAL: ["Execution of 'source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status' returned 255. Connection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 1 sec. Retry count = 1\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 2 sec. Retry count = 2\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 4 sec. Retry count = 3\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 8 sec. Retry count = 4\nError: IO_ERROR : java.io.IOException: Error while connecting Oozie server. No of retries = 4. Exception = Connection refused"] ERROR 2018-10-05 09:24:48,524 script_alert.py:123 - [Alert][oozie_server_status] Failed with result CRITICAL: ["Execution of 'source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status' returned 255. Connection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 1 sec. Retry count = 1\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 2 sec. Retry count = 2\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 4 sec. Retry count = 3\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 8 sec. Retry count = 4\nError: IO_ERROR : java.io.IOException: Error while connecting Oozie server. No of retries = 4. Exception = Connection refused"] INFO 2018-10-05 09:24:52,789 Controller.py:304 - Heartbeat (response id = 1122) with server is running... INFO 2018-10-05 09:24:52,790 Controller.py:311 - Building heartbeat message INFO 2018-10-05 09:24:52,794 Heartbeat.py:87 - Adding host info/state to heartbeat message. INFO 2018-10-05 09:24:52,922 logger.py:75 - Testing the JVM's JCE policy to see it if supports an unlimited key length. INFO 2018-10-05 09:24:52,922 logger.py:75 - Testing the JVM's JCE policy to see it if supports an unlimited key length. INFO 2018-10-05 09:24:53,715 Hardware.py:188 - Some mount points were ignored: /dev/shm, /run, /sys/fs/cgroup, /run/user/42, /run/user/0 INFO 2018-10-05 09:24:53,717 Controller.py:320 - Sending Heartbeat (id = 1122) INFO 2018-10-05 09:24:53,719 Controller.py:333 - Heartbeat response received (id = 1123) INFO 2018-10-05 09:24:53,719 Controller.py:342 - Heartbeat interval is 1 seconds INFO 2018-10-05 09:24:53,719 Controller.py:380 - Updating configurations from heartbeat INFO 2018-10-05 09:24:53,720 Controller.py:389 - Adding cancel/execution commands INFO 2018-10-05 09:24:53,720 Controller.py:475 - Waiting 0.9 for next heartbeat INFO 2018-10-05 09:24:54,620 Controller.py:482 - Wait for next heartbeat over ERROR 2018-10-05 09:25:32,466 script_alert.py:123 - [Alert][hive_webhcat_server_status] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:50111/templeton/v1/status?user.name=ambari-qa + \nTraceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_webhcat_server.py", line 190, in execute\n url_response = urllib2.urlopen(query_url, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n'] ERROR 2018-10-05 09:25:32,466 script_alert.py:123 - [Alert][hive_webhcat_server_status] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:50111/templeton/v1/status?user.name=ambari-qa + \nTraceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_webhcat_server.py", line 190, in execute\n url_response = urllib2.urlopen(query_url, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n'] ERROR 2018-10-05 09:25:32,479 script_alert.py:123 - [Alert][yarn_nodemanager_health] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:8042/ws/v1/node/info (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 171, in execute\n url_response = urllib2.urlopen(query, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n)'] ERROR 2018-10-05 09:25:32,479 script_alert.py:123 - [Alert][yarn_nodemanager_health] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:8042/ws/v1/node/info (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 171, in execute\n url_response = urllib2.urlopen(query, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n)'] WARNING 2018-10-05 09:25:32,484 base_alert.py:138 - [Alert][zeppelin_server_status] Unable to execute alert. 'NoneType' object has no attribute '__getitem__' WARNING 2018-10-05 09:25:32,487 base_alert.py:138 - [Alert][namenode_hdfs_pending_deletion_blocks] Unable to execute alert. [Alert][namenode_hdfs_pending_deletion_blocks] Unable to extract JSON from JMX response WARNING 2018-10-05 09:25:32,491 base_alert.py:138 - [Alert][datanode_heap_usage] Unable to execute alert. [Alert][datanode_heap_usage] Unable to extract JSON from JMX response WARNING 2018-10-05 09:25:32,492 base_alert.py:138 - [Alert][datanode_health_summary] Unable to execute alert. [Alert][datanode_health_summary] Unable to extract JSON from JMX response ERROR 2018-10-05 09:25:32,494 script_alert.py:123 - [Alert][datanode_unmounted_data_dir] Failed with result CRITICAL: ['The following data dir(s) were not found: /hadoop/hdfs/data\n/root/hadoop/hdfs/data\n'] ERROR 2018-10-05 09:25:32,494 script_alert.py:123 - [Alert][datanode_unmounted_data_dir] Failed with result CRITICAL: ['The following data dir(s) were not found: /hadoop/hdfs/data\n/root/hadoop/hdfs/data\n'] WARNING 2018-10-05 09:25:32,504 base_alert.py:138 - [Alert][namenode_hdfs_blocks_health] Unable to execute alert. [Alert][namenode_hdfs_blocks_health] Unable to extract JSON from JMX response WARNING 2018-10-05 09:25:32,508 base_alert.py:138 - [Alert][datanode_storage] Unable to execute alert. [Alert][datanode_storage] Unable to extract JSON from JMX response WARNING 2018-10-05 09:25:32,512 base_alert.py:138 - [Alert][namenode_directory_status] Unable to execute alert. [Alert][namenode_directory_status] Unable to extract JSON from JMX response WARNING 2018-10-05 09:25:32,516 base_alert.py:138 - [Alert][namenode_hdfs_capacity_utilization] Unable to execute alert. [Alert][namenode_hdfs_capacity_utilization] Unable to extract JSON from JMX response WARNING 2018-10-05 09:25:32,517 base_alert.py:138 - [Alert][namenode_rpc_latency] Unable to execute alert. [Alert][namenode_rpc_latency] Unable to extract JSON from JMX response INFO 2018-10-05 09:25:32,564 logger.py:75 - Execute['export HIVE_CONF_DIR='/usr/hdp/current/hive-metastore/conf' ; hive --hiveconf hive.metastore.uris=thrift://master.hadoop.com:9083 --hiveconf hive.metastore.client.connect.retry.delay=1 --hiveconf hive.metastore.failure.retries=1 --hiveconf hive.metastore.connect.retries=1 --hiveconf hive.metastore.client.socket.timeout=14 --hiveconf hive.execution.engine=mr -e 'show databases;''] {'path': ['/bin/', '/usr/bin/', '/usr/sbin/', u'/usr/hdp/current/hive-metastore/bin'], 'timeout_kill_strategy': 2, 'timeout': 60, 'user': 'ambari-qa'} INFO 2018-10-05 09:25:32,564 logger.py:75 - Execute['export HIVE_CONF_DIR='/usr/hdp/current/hive-metastore/conf' ; hive --hiveconf hive.metastore.uris=thrift://master.hadoop.com:9083 --hiveconf hive.metastore.client.connect.retry.delay=1 --hiveconf hive.metastore.failure.retries=1 --hiveconf hive.metastore.connect.retries=1 --hiveconf hive.metastore.client.socket.timeout=14 --hiveconf hive.execution.engine=mr -e 'show databases;''] {'path': ['/bin/', '/usr/bin/', '/usr/sbin/', u'/usr/hdp/current/hive-metastore/bin'], 'timeout_kill_strategy': 2, 'timeout': 60, 'user': 'ambari-qa'} INFO 2018-10-05 09:25:32,571 logger.py:75 - Execute['! beeline -u 'jdbc:hive2://master.hadoop.com:10000/;transportMode=binary' -e '' 2>&1| awk '{print}'|grep -i -e 'Connection refused' -e 'Invalid URL''] {'path': ['/bin/', '/usr/bin/', '/usr/lib/hive/bin/', '/usr/sbin/'], 'timeout_kill_strategy': 2, 'timeout': 60, 'user': 'ambari-qa'} INFO 2018-10-05 09:25:32,571 logger.py:75 - Execute['! beeline -u 'jdbc:hive2://master.hadoop.com:10000/;transportMode=binary' -e '' 2>&1| awk '{print}'|grep -i -e 'Connection refused' -e 'Invalid URL''] {'path': ['/bin/', '/usr/bin/', '/usr/lib/hive/bin/', '/usr/sbin/'], 'timeout_kill_strategy': 2, 'timeout': 60, 'user': 'ambari-qa'} INFO 2018-10-05 09:25:32,578 logger.py:75 - Pid file /var/run/ambari-metrics-monitor/ambari-metrics-monitor.pid is empty or does not exist INFO 2018-10-05 09:25:32,578 logger.py:75 - Pid file /var/run/ambari-metrics-monitor/ambari-metrics-monitor.pid is empty or does not exist ERROR 2018-10-05 09:25:32,578 script_alert.py:123 - [Alert][ams_metrics_monitor_process] Failed with result CRITICAL: ['Ambari Monitor is NOT running on master.hadoop.com'] ERROR 2018-10-05 09:25:32,578 script_alert.py:123 - [Alert][ams_metrics_monitor_process] Failed with result CRITICAL: ['Ambari Monitor is NOT running on master.hadoop.com'] INFO 2018-10-05 09:25:32,584 logger.py:75 - Execute['source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status'] {'environment': None, 'user': 'oozie'} INFO 2018-10-05 09:25:32,584 logger.py:75 - Execute['source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status'] {'environment': None, 'user': 'oozie'} ERROR 2018-10-05 09:25:36,806 script_alert.py:123 - [Alert][hive_server_process] Failed with result CRITICAL: ['Connection failed on host master.hadoop.com:10000 (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_hive_thrift_port.py", line 212, in execute\n ldap_password=ldap_password)\n File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/hive_check.py", line 81, in check_thrift_port_sasl\n timeout_kill_strategy=TerminateStrategy.KILL_PROCESS_TREE,\n File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 166, in __init__\n self.env.run()\n File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run\n self.run_action(resource, action)\n File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action\n provider_action()\n File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 262, in action_run\n tries=self.resource.tries, try_sleep=self.resource.try_sleep)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner\n result = function(command, **kwargs)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call\n tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper\n result = _call(command, **kwargs_copy)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call\n raise ExecutionFailed(err_msg, code, out, err)\nExecutionFailed: Execution of \'! beeline -u \'jdbc:hive2://master.hadoop.com:10000/;transportMode=binary\' -e \'\' 2>&1| awk \'{print}\'|grep -i -e \'Connection refused\' -e \'Invalid URL\'\' returned 1. Error: Could not open client transport with JDBC Uri: jdbc:hive2://master.hadoop.com:10000/;transportMode=binary: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0)\nError: Could not open client transport with JDBC Uri: jdbc:hive2://master.hadoop.com:10000/;transportMode=binary: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0)\n)'] ERROR 2018-10-05 09:25:36,806 script_alert.py:123 - [Alert][hive_server_process] Failed with result CRITICAL: ['Connection failed on host master.hadoop.com:10000 (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_hive_thrift_port.py", line 212, in execute\n ldap_password=ldap_password)\n File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/hive_check.py", line 81, in check_thrift_port_sasl\n timeout_kill_strategy=TerminateStrategy.KILL_PROCESS_TREE,\n File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 166, in __init__\n self.env.run()\n File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run\n self.run_action(resource, action)\n File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action\n provider_action()\n File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 262, in action_run\n tries=self.resource.tries, try_sleep=self.resource.try_sleep)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner\n result = function(command, **kwargs)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call\n tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper\n result = _call(command, **kwargs_copy)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call\n raise ExecutionFailed(err_msg, code, out, err)\nExecutionFailed: Execution of \'! beeline -u \'jdbc:hive2://master.hadoop.com:10000/;transportMode=binary\' -e \'\' 2>&1| awk \'{print}\'|grep -i -e \'Connection refused\' -e \'Invalid URL\'\' returned 1. Error: Could not open client transport with JDBC Uri: jdbc:hive2://master.hadoop.com:10000/;transportMode=binary: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0)\nError: Could not open client transport with JDBC Uri: jdbc:hive2://master.hadoop.com:10000/;transportMode=binary: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0)\n)'] ERROR 2018-10-05 09:25:39,842 script_alert.py:123 - [Alert][hive_metastore_process] Failed with result CRITICAL: ['Metastore on master.hadoop.com failed (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_hive_metastore.py", line 203, in execute\n timeout_kill_strategy=TerminateStrategy.KILL_PROCESS_TREE,\n File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 166, in __init__\n self.env.run()\n File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run\n self.run_action(resource, action)\n File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action\n provider_action()\n File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 262, in action_run\n tries=self.resource.tries, try_sleep=self.resource.try_sleep)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner\n result = function(command, **kwargs)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call\n tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper\n result = _call(command, **kwargs_copy)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call\n raise ExecutionFailed(err_msg, code, out, err)\nExecutionFailed: Execution of \'export HIVE_CONF_DIR=\'/usr/hdp/current/hive-metastore/conf\' ; hive --hiveconf hive.metastore.uris=thrift://master.hadoop.com:9083 --hiveconf hive.metastore.client.connect.retry.delay=1 --hiveconf hive.metastore.failure.retries=1 --hiveconf hive.metastore.connect.retries=1 --hiveconf hive.metastore.client.socket.timeout=14 --hiveconf hive.execution.engine=mr -e \'show databases;\'\' returned 1. log4j:WARN No such property [maxFileSize] in org.apache.log4j.DailyRollingFileAppender.\n\nLogging initialized using configuration in file:/etc/hive/2.6.5.0-292/0/hive-log4j.properties\nException in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient\n\tat org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:569)\n\tat org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)\n\tat org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.lang.reflect.Method.invoke(Method.java:498)\n\tat org.apache.hadoop.util.RunJar.run(RunJar.java:233)\n\tat org.apache.hadoop.util.RunJar.main(RunJar.java:148)\nCaused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient\n\tat org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1566)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:92)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:138)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:110)\n\tat org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3556)\n\tat org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3588)\n\tat org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:550)\n\t... 8 more\nCaused by: java.lang.reflect.InvocationTargetException\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)\n\tat sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)\n\tat java.lang.reflect.Constructor.newInstance(Constructor.java:423)\n\tat org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1564)\n\t... 14 more\nCaused by: MetaException(message:Could not connect to meta store using any of the URIs provided. Most recent failure: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused (Connection refused)\n\tat org.apache.thrift.transport.TSocket.open(TSocket.java:226)\n\tat org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:487)\n\tat org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:282)\n\tat org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:76)\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)\n\tat sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)\n\tat java.lang.reflect.Constructor.newInstance(Constructor.java:423)\n\tat org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1564)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:92)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:138)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:110)\n\tat org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3556)\n\tat org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3588)\n\tat org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:550)\n\tat org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)\n\tat org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.lang.reflect.Method.invoke(Method.java:498)\n\tat org.apache.hadoop.util.RunJar.run(RunJar.java:233)\n\tat org.apache.hadoop.util.RunJar.main(RunJar.java:148)\nCaused by: java.net.ConnectException: Connection refused (Connection refused)\n\tat java.net.PlainSocketImpl.socketConnect(Native Method)\n\tat java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)\n\tat java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)\n\tat java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)\n\tat java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)\n\tat java.net.Socket.connect(Socket.java:589)\n\tat org.apache.thrift.transport.TSocket.open(TSocket.java:221)\n\t... 22 more\n)\n\tat org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:534)\n\tat org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:282)\n\tat org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:76)\n\t... 19 more\n)'] ERROR 2018-10-05 09:25:39,842 script_alert.py:123 - [Alert][hive_metastore_process] Failed with result CRITICAL: ['Metastore on master.hadoop.com failed (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_hive_metastore.py", line 203, in execute\n timeout_kill_strategy=TerminateStrategy.KILL_PROCESS_TREE,\n File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 166, in __init__\n self.env.run()\n File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run\n self.run_action(resource, action)\n File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action\n provider_action()\n File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 262, in action_run\n tries=self.resource.tries, try_sleep=self.resource.try_sleep)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner\n result = function(command, **kwargs)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call\n tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper\n result = _call(command, **kwargs_copy)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call\n raise ExecutionFailed(err_msg, code, out, err)\nExecutionFailed: Execution of \'export HIVE_CONF_DIR=\'/usr/hdp/current/hive-metastore/conf\' ; hive --hiveconf hive.metastore.uris=thrift://master.hadoop.com:9083 --hiveconf hive.metastore.client.connect.retry.delay=1 --hiveconf hive.metastore.failure.retries=1 --hiveconf hive.metastore.connect.retries=1 --hiveconf hive.metastore.client.socket.timeout=14 --hiveconf hive.execution.engine=mr -e \'show databases;\'\' returned 1. log4j:WARN No such property [maxFileSize] in org.apache.log4j.DailyRollingFileAppender.\n\nLogging initialized using configuration in file:/etc/hive/2.6.5.0-292/0/hive-log4j.properties\nException in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient\n\tat org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:569)\n\tat org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)\n\tat org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.lang.reflect.Method.invoke(Method.java:498)\n\tat org.apache.hadoop.util.RunJar.run(RunJar.java:233)\n\tat org.apache.hadoop.util.RunJar.main(RunJar.java:148)\nCaused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient\n\tat org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1566)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:92)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:138)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:110)\n\tat org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3556)\n\tat org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3588)\n\tat org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:550)\n\t... 8 more\nCaused by: java.lang.reflect.InvocationTargetException\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)\n\tat sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)\n\tat java.lang.reflect.Constructor.newInstance(Constructor.java:423)\n\tat org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1564)\n\t... 14 more\nCaused by: MetaException(message:Could not connect to meta store using any of the URIs provided. Most recent failure: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused (Connection refused)\n\tat org.apache.thrift.transport.TSocket.open(TSocket.java:226)\n\tat org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:487)\n\tat org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:282)\n\tat org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:76)\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)\n\tat sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)\n\tat java.lang.reflect.Constructor.newInstance(Constructor.java:423)\n\tat org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1564)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:92)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:138)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:110)\n\tat org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3556)\n\tat org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3588)\n\tat org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:550)\n\tat org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)\n\tat org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.lang.reflect.Method.invoke(Method.java:498)\n\tat org.apache.hadoop.util.RunJar.run(RunJar.java:233)\n\tat org.apache.hadoop.util.RunJar.main(RunJar.java:148)\nCaused by: java.net.ConnectException: Connection refused (Connection refused)\n\tat java.net.PlainSocketImpl.socketConnect(Native Method)\n\tat java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)\n\tat java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)\n\tat java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)\n\tat java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)\n\tat java.net.Socket.connect(Socket.java:589)\n\tat org.apache.thrift.transport.TSocket.open(TSocket.java:221)\n\t... 22 more\n)\n\tat org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:534)\n\tat org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:282)\n\tat org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:76)\n\t... 19 more\n)'] ERROR 2018-10-05 09:25:48,492 script_alert.py:123 - [Alert][oozie_server_status] Failed with result CRITICAL: ["Execution of 'source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status' returned 255. Connection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 1 sec. Retry count = 1\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 2 sec. Retry count = 2\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 4 sec. Retry count = 3\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 8 sec. Retry count = 4\nError: IO_ERROR : java.io.IOException: Error while connecting Oozie server. No of retries = 4. Exception = Connection refused"] ERROR 2018-10-05 09:25:48,492 script_alert.py:123 - [Alert][oozie_server_status] Failed with result CRITICAL: ["Execution of 'source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status' returned 255. Connection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 1 sec. Retry count = 1\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 2 sec. Retry count = 2\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 4 sec. Retry count = 3\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 8 sec. Retry count = 4\nError: IO_ERROR : java.io.IOException: Error while connecting Oozie server. No of retries = 4. Exception = Connection refused"] INFO 2018-10-05 09:25:53,688 Controller.py:304 - Heartbeat (response id = 1188) with server is running... INFO 2018-10-05 09:25:53,689 Controller.py:311 - Building heartbeat message INFO 2018-10-05 09:25:53,693 Heartbeat.py:87 - Adding host info/state to heartbeat message. INFO 2018-10-05 09:25:53,833 logger.py:75 - Testing the JVM's JCE policy to see it if supports an unlimited key length. INFO 2018-10-05 09:25:53,833 logger.py:75 - Testing the JVM's JCE policy to see it if supports an unlimited key length. INFO 2018-10-05 09:25:54,027 Hardware.py:188 - Some mount points were ignored: /dev/shm, /run, /sys/fs/cgroup, /run/user/42, /run/user/0 INFO 2018-10-05 09:25:54,028 Controller.py:320 - Sending Heartbeat (id = 1188) INFO 2018-10-05 09:25:54,029 Controller.py:333 - Heartbeat response received (id = 1189) INFO 2018-10-05 09:25:54,029 Controller.py:342 - Heartbeat interval is 1 seconds INFO 2018-10-05 09:25:54,030 Controller.py:380 - Updating configurations from heartbeat INFO 2018-10-05 09:25:54,030 Controller.py:389 - Adding cancel/execution commands INFO 2018-10-05 09:25:54,030 Controller.py:475 - Waiting 0.9 for next heartbeat INFO 2018-10-05 09:25:54,930 Controller.py:482 - Wait for next heartbeat over ERROR 2018-10-05 09:26:32,449 script_alert.py:123 - [Alert][hive_webhcat_server_status] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:50111/templeton/v1/status?user.name=ambari-qa + \nTraceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_webhcat_server.py", line 190, in execute\n url_response = urllib2.urlopen(query_url, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n'] ERROR 2018-10-05 09:26:32,454 script_alert.py:123 - [Alert][yarn_nodemanager_health] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:8042/ws/v1/node/info (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 171, in execute\n url_response = urllib2.urlopen(query, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n)'] ERROR 2018-10-05 09:26:32,449 script_alert.py:123 - [Alert][hive_webhcat_server_status] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:50111/templeton/v1/status?user.name=ambari-qa + \nTraceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_webhcat_server.py", line 190, in execute\n url_response = urllib2.urlopen(query_url, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n'] ERROR 2018-10-05 09:26:32,454 script_alert.py:123 - [Alert][yarn_nodemanager_health] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:8042/ws/v1/node/info (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 171, in execute\n url_response = urllib2.urlopen(query, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n)'] WARNING 2018-10-05 09:26:32,496 base_alert.py:138 - [Alert][zeppelin_server_status] Unable to execute alert. 'NoneType' object has no attribute '__getitem__' WARNING 2018-10-05 09:26:32,501 base_alert.py:138 - [Alert][datanode_health_summary] Unable to execute alert. [Alert][datanode_health_summary] Unable to extract JSON from JMX response WARNING 2018-10-05 09:26:32,518 base_alert.py:138 - [Alert][namenode_directory_status] Unable to execute alert. [Alert][namenode_directory_status] Unable to extract JSON from JMX response INFO 2018-10-05 09:26:32,563 logger.py:75 - Pid file /var/run/ambari-metrics-monitor/ambari-metrics-monitor.pid is empty or does not exist INFO 2018-10-05 09:26:32,563 logger.py:75 - Pid file /var/run/ambari-metrics-monitor/ambari-metrics-monitor.pid is empty or does not exist ERROR 2018-10-05 09:26:32,564 script_alert.py:123 - [Alert][ams_metrics_monitor_process] Failed with result CRITICAL: ['Ambari Monitor is NOT running on master.hadoop.com'] ERROR 2018-10-05 09:26:32,564 script_alert.py:123 - [Alert][ams_metrics_monitor_process] Failed with result CRITICAL: ['Ambari Monitor is NOT running on master.hadoop.com'] INFO 2018-10-05 09:26:32,565 logger.py:75 - Execute['source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status'] {'environment': None, 'user': 'oozie'} INFO 2018-10-05 09:26:32,565 logger.py:75 - Execute['source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status'] {'environment': None, 'user': 'oozie'} ERROR 2018-10-05 09:26:48,501 script_alert.py:123 - [Alert][oozie_server_status] Failed with result CRITICAL: ["Execution of 'source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status' returned 255. Connection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 1 sec. Retry count = 1\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 2 sec. Retry count = 2\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 4 sec. Retry count = 3\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 8 sec. Retry count = 4\nError: IO_ERROR : java.io.IOException: Error while connecting Oozie server. No of retries = 4. Exception = Connection refused"] ERROR 2018-10-05 09:26:48,501 script_alert.py:123 - [Alert][oozie_server_status] Failed with result CRITICAL: ["Execution of 'source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status' returned 255. Connection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 1 sec. Retry count = 1\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 2 sec. Retry count = 2\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 4 sec. Retry count = 3\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 8 sec. Retry count = 4\nError: IO_ERROR : java.io.IOException: Error while connecting Oozie server. No of retries = 4. Exception = Connection refused"] INFO 2018-10-05 09:26:54,011 Controller.py:304 - Heartbeat (response id = 1254) with server is running... INFO 2018-10-05 09:26:54,011 Controller.py:311 - Building heartbeat message INFO 2018-10-05 09:26:54,016 Heartbeat.py:87 - Adding host info/state to heartbeat message. INFO 2018-10-05 09:26:54,152 logger.py:75 - Testing the JVM's JCE policy to see it if supports an unlimited key length. INFO 2018-10-05 09:26:54,152 logger.py:75 - Testing the JVM's JCE policy to see it if supports an unlimited key length. INFO 2018-10-05 09:26:54,353 Hardware.py:188 - Some mount points were ignored: /dev/shm, /run, /sys/fs/cgroup, /run/user/42, /run/user/0 INFO 2018-10-05 09:26:54,354 Controller.py:320 - Sending Heartbeat (id = 1254) INFO 2018-10-05 09:26:54,356 Controller.py:333 - Heartbeat response received (id = 1255) INFO 2018-10-05 09:26:54,356 Controller.py:342 - Heartbeat interval is 1 seconds INFO 2018-10-05 09:26:54,356 Controller.py:380 - Updating configurations from heartbeat INFO 2018-10-05 09:26:54,356 Controller.py:389 - Adding cancel/execution commands INFO 2018-10-05 09:26:54,356 Controller.py:475 - Waiting 0.9 for next heartbeat INFO 2018-10-05 09:26:55,257 Controller.py:482 - Wait for next heartbeat over WARNING 2018-10-05 09:27:32,481 base_alert.py:138 - [Alert][hbase_master_cpu] Unable to execute alert. [Alert][hbase_master_cpu] Unable to extract JSON from JMX response ERROR 2018-10-05 09:27:32,491 script_alert.py:123 - [Alert][hive_webhcat_server_status] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:50111/templeton/v1/status?user.name=ambari-qa + \nTraceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_webhcat_server.py", line 190, in execute\n url_response = urllib2.urlopen(query_url, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n'] ERROR 2018-10-05 09:27:32,491 script_alert.py:123 - [Alert][hive_webhcat_server_status] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:50111/templeton/v1/status?user.name=ambari-qa + \nTraceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_webhcat_server.py", line 190, in execute\n url_response = urllib2.urlopen(query_url, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n'] ERROR 2018-10-05 09:27:32,497 script_alert.py:123 - [Alert][yarn_nodemanager_health] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:8042/ws/v1/node/info (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 171, in execute\n url_response = urllib2.urlopen(query, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n)'] ERROR 2018-10-05 09:27:32,497 script_alert.py:123 - [Alert][yarn_nodemanager_health] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:8042/ws/v1/node/info (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 171, in execute\n url_response = urllib2.urlopen(query, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n)'] WARNING 2018-10-05 09:27:32,505 base_alert.py:138 - [Alert][yarn_resourcemanager_cpu] Unable to execute alert. [Alert][yarn_resourcemanager_cpu] Unable to extract JSON from JMX response WARNING 2018-10-05 09:27:32,511 base_alert.py:138 - [Alert][yarn_resourcemanager_rpc_latency] Unable to execute alert. [Alert][yarn_resourcemanager_rpc_latency] Unable to extract JSON from JMX response WARNING 2018-10-05 09:27:32,524 base_alert.py:138 - [Alert][ams_metrics_collector_hbase_master_cpu] Unable to execute alert. [Alert][ams_metrics_collector_hbase_master_cpu] Unable to extract JSON from JMX response WARNING 2018-10-05 09:27:32,530 base_alert.py:138 - [Alert][zeppelin_server_status] Unable to execute alert. 'NoneType' object has no attribute '__getitem__' WARNING 2018-10-05 09:27:32,531 base_alert.py:138 - [Alert][namenode_cpu] Unable to execute alert. [Alert][namenode_cpu] Unable to extract JSON from JMX response WARNING 2018-10-05 09:27:32,534 base_alert.py:138 - [Alert][namenode_hdfs_pending_deletion_blocks] Unable to execute alert. [Alert][namenode_hdfs_pending_deletion_blocks] Unable to extract JSON from JMX response WARNING 2018-10-05 09:27:32,537 base_alert.py:138 - [Alert][datanode_heap_usage] Unable to execute alert. [Alert][datanode_heap_usage] Unable to extract JSON from JMX response WARNING 2018-10-05 09:27:32,542 base_alert.py:138 - [Alert][datanode_health_summary] Unable to execute alert. [Alert][datanode_health_summary] Unable to extract JSON from JMX response ERROR 2018-10-05 09:27:32,547 script_alert.py:123 - [Alert][datanode_unmounted_data_dir] Failed with result CRITICAL: ['The following data dir(s) were not found: /hadoop/hdfs/data\n/root/hadoop/hdfs/data\n'] WARNING 2018-10-05 09:27:32,547 base_alert.py:138 - [Alert][namenode_hdfs_blocks_health] Unable to execute alert. [Alert][namenode_hdfs_blocks_health] Unable to extract JSON from JMX response ERROR 2018-10-05 09:27:32,547 script_alert.py:123 - [Alert][datanode_unmounted_data_dir] Failed with result CRITICAL: ['The following data dir(s) were not found: /hadoop/hdfs/data\n/root/hadoop/hdfs/data\n'] WARNING 2018-10-05 09:27:32,565 base_alert.py:138 - [Alert][datanode_storage] Unable to execute alert. [Alert][datanode_storage] Unable to extract JSON from JMX response WARNING 2018-10-05 09:27:32,569 base_alert.py:138 - [Alert][namenode_service_rpc_processing_latency_hourly] Unable to execute alert. Couldn't define hadoop_conf_dir: argument of type 'NoneType' is not iterable WARNING 2018-10-05 09:27:32,574 base_alert.py:138 - [Alert][namenode_directory_status] Unable to execute alert. [Alert][namenode_directory_status] Unable to extract JSON from JMX response WARNING 2018-10-05 09:27:32,587 base_alert.py:138 - [Alert][namenode_hdfs_capacity_utilization] Unable to execute alert. [Alert][namenode_hdfs_capacity_utilization] Unable to extract JSON from JMX response WARNING 2018-10-05 09:27:32,597 base_alert.py:138 - [Alert][namenode_rpc_latency] Unable to execute alert. [Alert][namenode_rpc_latency] Unable to extract JSON from JMX response WARNING 2018-10-05 09:27:32,601 base_alert.py:138 - [Alert][namenode_client_rpc_processing_latency_hourly] Unable to execute alert. Couldn't define hadoop_conf_dir: argument of type 'NoneType' is not iterable WARNING 2018-10-05 09:27:32,603 base_alert.py:138 - [Alert][namenode_client_rpc_queue_latency_hourly] Unable to execute alert. Couldn't define hadoop_conf_dir: argument of type 'NoneType' is not iterable WARNING 2018-10-05 09:27:32,606 base_alert.py:138 - [Alert][namenode_service_rpc_queue_latency_hourly] Unable to execute alert. Couldn't define hadoop_conf_dir: argument of type 'NoneType' is not iterable WARNING 2018-10-05 09:27:32,609 base_alert.py:138 - [Alert][mapreduce_history_server_rpc_latency] Unable to execute alert. [Alert][mapreduce_history_server_rpc_latency] Unable to extract JSON from JMX response WARNING 2018-10-05 09:27:32,611 base_alert.py:138 - [Alert][mapreduce_history_server_cpu] Unable to execute alert. [Alert][mapreduce_history_server_cpu] Unable to extract JSON from JMX response WARNING 2018-10-05 09:27:32,620 logger.py:71 - Cannot find the stack name in the command. Stack tools cannot be loaded WARNING 2018-10-05 09:27:32,620 logger.py:71 - Cannot find the stack name in the command. Stack tools cannot be loaded INFO 2018-10-05 09:27:32,620 logger.py:75 - call[('ambari-python-wrap', None, 'versions')] {} INFO 2018-10-05 09:27:32,620 logger.py:75 - call[('ambari-python-wrap', None, 'versions')] {} INFO 2018-10-05 09:27:32,643 logger.py:75 - call returned (1, "/bin/ambari-python-wrap: can't find '__main__' module in ''") INFO 2018-10-05 09:27:32,643 logger.py:75 - call returned (1, "/bin/ambari-python-wrap: can't find '__main__' module in ''") ERROR 2018-10-05 09:27:32,645 script_alert.py:123 - [Alert][ambari_agent_version_select] Failed with result CRITICAL: ["hdp-select could not properly read /usr/hdp. Check this directory for unexpected contents.\n/bin/ambari-python-wrap: can't find '__main__' module in ''"] ERROR 2018-10-05 09:27:32,645 script_alert.py:123 - [Alert][ambari_agent_version_select] Failed with result CRITICAL: ["hdp-select could not properly read /usr/hdp. Check this directory for unexpected contents.\n/bin/ambari-python-wrap: can't find '__main__' module in ''"] INFO 2018-10-05 09:27:32,656 logger.py:75 - Pid file /var/run/ambari-metrics-monitor/ambari-metrics-monitor.pid is empty or does not exist INFO 2018-10-05 09:27:32,656 logger.py:75 - Pid file /var/run/ambari-metrics-monitor/ambari-metrics-monitor.pid is empty or does not exist ERROR 2018-10-05 09:27:32,656 script_alert.py:123 - [Alert][ams_metrics_monitor_process] Failed with result CRITICAL: ['Ambari Monitor is NOT running on master.hadoop.com'] ERROR 2018-10-05 09:27:32,656 script_alert.py:123 - [Alert][ams_metrics_monitor_process] Failed with result CRITICAL: ['Ambari Monitor is NOT running on master.hadoop.com'] INFO 2018-10-05 09:27:32,659 logger.py:75 - Execute['source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status'] {'environment': None, 'user': 'oozie'} INFO 2018-10-05 09:27:32,659 logger.py:75 - Execute['source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status'] {'environment': None, 'user': 'oozie'} ERROR 2018-10-05 09:27:48,588 script_alert.py:123 - [Alert][oozie_server_status] Failed with result CRITICAL: ["Execution of 'source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status' returned 255. Connection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 1 sec. Retry count = 1\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 2 sec. Retry count = 2\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 4 sec. Retry count = 3\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 8 sec. Retry count = 4\nError: IO_ERROR : java.io.IOException: Error while connecting Oozie server. No of retries = 4. Exception = Connection refused"] ERROR 2018-10-05 09:27:48,588 script_alert.py:123 - [Alert][oozie_server_status] Failed with result CRITICAL: ["Execution of 'source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status' returned 255. Connection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 1 sec. Retry count = 1\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 2 sec. Retry count = 2\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 4 sec. Retry count = 3\nConnection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 8 sec. Retry count = 4\nError: IO_ERROR : java.io.IOException: Error while connecting Oozie server. No of retries = 4. Exception = Connection refused"] INFO 2018-10-05 09:27:54,403 Controller.py:304 - Heartbeat (response id = 1320) with server is running... INFO 2018-10-05 09:27:54,404 Controller.py:311 - Building heartbeat message INFO 2018-10-05 09:27:54,408 Heartbeat.py:87 - Adding host info/state to heartbeat message. INFO 2018-10-05 09:27:54,539 logger.py:75 - Testing the JVM's JCE policy to see it if supports an unlimited key length. INFO 2018-10-05 09:27:54,539 logger.py:75 - Testing the JVM's JCE policy to see it if supports an unlimited key length. INFO 2018-10-05 09:27:54,718 Hardware.py:188 - Some mount points were ignored: /dev/shm, /run, /sys/fs/cgroup, /run/user/42, /run/user/0 INFO 2018-10-05 09:27:54,719 Controller.py:320 - Sending Heartbeat (id = 1320) INFO 2018-10-05 09:27:54,720 Controller.py:333 - Heartbeat response received (id = 1321) INFO 2018-10-05 09:27:54,720 Controller.py:342 - Heartbeat interval is 1 seconds INFO 2018-10-05 09:27:54,720 Controller.py:380 - Updating configurations from heartbeat INFO 2018-10-05 09:27:54,721 Controller.py:389 - Adding cancel/execution commands INFO 2018-10-05 09:27:54,721 Controller.py:475 - Waiting 0.9 for next heartbeat INFO 2018-10-05 09:27:55,621 Controller.py:482 - Wait for next heartbeat over ERROR 2018-10-05 09:28:32,491 script_alert.py:123 - [Alert][yarn_nodemanager_health] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:8042/ws/v1/node/info (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 171, in execute\n url_response = urllib2.urlopen(query, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n)'] ERROR 2018-10-05 09:28:32,491 script_alert.py:123 - [Alert][yarn_nodemanager_health] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:8042/ws/v1/node/info (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 171, in execute\n url_response = urllib2.urlopen(query, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n)'] ERROR 2018-10-05 09:28:32,492 script_alert.py:123 - [Alert][hive_webhcat_server_status] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:50111/templeton/v1/status?user.name=ambari-qa + \nTraceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_webhcat_server.py", line 190, in execute\n url_response = urllib2.urlopen(query_url, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n'] ERROR 2018-10-05 09:28:32,492 script_alert.py:123 - [Alert][hive_webhcat_server_status] Failed with result CRITICAL: ['Connection failed to http://master.hadoop.com:50111/templeton/v1/status?user.name=ambari-qa + \nTraceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_webhcat_server.py", line 190, in execute\n url_response = urllib2.urlopen(query_url, timeout=connection_timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n return self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: \n'] WARNING 2018-10-05 09:28:32,510 base_alert.py:138 - [Alert][zeppelin_server_status] Unable to execute alert. 'NoneType' object has no attribute '__getitem__' WARNING 2018-10-05 09:28:32,518 base_alert.py:138 - [Alert][datanode_health_summary] Unable to execute alert. [Alert][datanode_health_summary] Unable to extract JSON from JMX response WARNING 2018-10-05 09:28:32,538 base_alert.py:138 - [Alert][namenode_directory_status] Unable to execute alert. [Alert][namenode_directory_status] Unable to extract JSON from JMX response INFO 2018-10-05 09:28:32,570 logger.py:75 - Execute['export HIVE_CONF_DIR='/usr/hdp/current/hive-metastore/conf' ; hive --hiveconf hive.metastore.uris=thrift://master.hadoop.com:9083 --hiveconf hive.metastore.client.connect.retry.delay=1 --hiveconf hive.metastore.failure.retries=1 --hiveconf hive.metastore.connect.retries=1 --hiveconf hive.metastore.client.socket.timeout=14 --hiveconf hive.execution.engine=mr -e 'show databases;''] {'path': ['/bin/', '/usr/bin/', '/usr/sbin/', u'/usr/hdp/current/hive-metastore/bin'], 'timeout_kill_strategy': 2, 'timeout': 60, 'user': 'ambari-qa'} INFO 2018-10-05 09:28:32,570 logger.py:75 - Pid file /var/run/ambari-metrics-monitor/ambari-metrics-monitor.pid is empty or does not exist INFO 2018-10-05 09:28:32,570 logger.py:75 - Execute['export HIVE_CONF_DIR='/usr/hdp/current/hive-metastore/conf' ; hive --hiveconf hive.metastore.uris=thrift://master.hadoop.com:9083 --hiveconf hive.metastore.client.connect.retry.delay=1 --hiveconf hive.metastore.failure.retries=1 --hiveconf hive.metastore.connect.retries=1 --hiveconf hive.metastore.client.socket.timeout=14 --hiveconf hive.execution.engine=mr -e 'show databases;''] {'path': ['/bin/', '/usr/bin/', '/usr/sbin/', u'/usr/hdp/current/hive-metastore/bin'], 'timeout_kill_strategy': 2, 'timeout': 60, 'user': 'ambari-qa'} INFO 2018-10-05 09:28:32,570 logger.py:75 - Pid file /var/run/ambari-metrics-monitor/ambari-metrics-monitor.pid is empty or does not exist INFO 2018-10-05 09:28:32,577 logger.py:75 - Execute['! beeline -u 'jdbc:hive2://master.hadoop.com:10000/;transportMode=binary' -e '' 2>&1| awk '{print}'|grep -i -e 'Connection refused' -e 'Invalid URL''] {'path': ['/bin/', '/usr/bin/', '/usr/lib/hive/bin/', '/usr/sbin/'], 'timeout_kill_strategy': 2, 'timeout': 60, 'user': 'ambari-qa'} ERROR 2018-10-05 09:28:32,578 script_alert.py:123 - [Alert][ams_metrics_monitor_process] Failed with result CRITICAL: ['Ambari Monitor is NOT running on master.hadoop.com'] INFO 2018-10-05 09:28:32,577 logger.py:75 - Execute['! beeline -u 'jdbc:hive2://master.hadoop.com:10000/;transportMode=binary' -e '' 2>&1| awk '{print}'|grep -i -e 'Connection refused' -e 'Invalid URL''] {'path': ['/bin/', '/usr/bin/', '/usr/lib/hive/bin/', '/usr/sbin/'], 'timeout_kill_strategy': 2, 'timeout': 60, 'user': 'ambari-qa'} ERROR 2018-10-05 09:28:32,578 script_alert.py:123 - [Alert][ams_metrics_monitor_process] Failed with result CRITICAL: ['Ambari Monitor is NOT running on master.hadoop.com'] INFO 2018-10-05 09:28:32,614 logger.py:75 - Execute['source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status'] {'environment': None, 'user': 'oozie'} INFO 2018-10-05 09:28:32,614 logger.py:75 - Execute['source /usr/hdp/current/oozie-server/conf/oozie-env.sh ; oozie admin -oozie http://master.hadoop.com:11000/oozie -status'] {'environment': None, 'user': 'oozie'} ERROR 2018-10-05 09:28:36,785 script_alert.py:123 - [Alert][hive_server_process] Failed with result CRITICAL: ['Connection failed on host master.hadoop.com:10000 (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_hive_thrift_port.py", line 212, in execute\n ldap_password=ldap_password)\n File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/hive_check.py", line 81, in check_thrift_port_sasl\n timeout_kill_strategy=TerminateStrategy.KILL_PROCESS_TREE,\n File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 166, in __init__\n self.env.run()\n File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run\n self.run_action(resource, action)\n File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action\n provider_action()\n File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 262, in action_run\n tries=self.resource.tries, try_sleep=self.resource.try_sleep)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner\n result = function(command, **kwargs)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call\n tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper\n result = _call(command, **kwargs_copy)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call\n raise ExecutionFailed(err_msg, code, out, err)\nExecutionFailed: Execution of \'! beeline -u \'jdbc:hive2://master.hadoop.com:10000/;transportMode=binary\' -e \'\' 2>&1| awk \'{print}\'|grep -i -e \'Connection refused\' -e \'Invalid URL\'\' returned 1. Error: Could not open client transport with JDBC Uri: jdbc:hive2://master.hadoop.com:10000/;transportMode=binary: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0)\nError: Could not open client transport with JDBC Uri: jdbc:hive2://master.hadoop.com:10000/;transportMode=binary: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0)\n)'] ERROR 2018-10-05 09:28:36,785 script_alert.py:123 - [Alert][hive_server_process] Failed with result CRITICAL: ['Connection failed on host master.hadoop.com:10000 (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_hive_thrift_port.py", line 212, in execute\n ldap_password=ldap_password)\n File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/hive_check.py", line 81, in check_thrift_port_sasl\n timeout_kill_strategy=TerminateStrategy.KILL_PROCESS_TREE,\n File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 166, in __init__\n self.env.run()\n File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run\n self.run_action(resource, action)\n File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action\n provider_action()\n File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 262, in action_run\n tries=self.resource.tries, try_sleep=self.resource.try_sleep)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner\n result = function(command, **kwargs)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call\n tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper\n result = _call(command, **kwargs_copy)\n File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call\n raise ExecutionFailed(err_msg, code, out, err)\nExecutionFailed: Execution of \'! beeline -u \'jdbc:hive2://master.hadoop.com:10000/;transportMode=binary\' -e \'\' 2>&1| awk \'{print}\'|grep -i -e \'Connection refused\' -e \'Invalid URL\'\' returned 1. Error: Could not open client transport with JDBC Uri: jdbc:hive2://master.hadoop.com:10000/;transportMode=binary: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0)\nError: Could not open client transport with JDBC Uri: jdbc:hive2://master.hadoop.com:10000/;transportMode=binary: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0)\n)']