Created on
04-06-2020
04:16 PM
- last edited on
04-06-2020
10:22 PM
by
VidyaSargur
I have the following hdf setup
Amabri Server - AWS asg (3 AZ)
Ambari Metrics (Nifi CA, grafana, metrics monitor) single AZ
NiFi - 3 node cluster, 3 AZ
Zookeeper - 3 nodes, 3 AZ
Ambari config in RDS
All EC2 servers are started in the morning via aws cli, ambari server has autostart enabled
After start, ambari UI says all services are stopped, even though EC2 nodes are all started. errno 111 connection refused error is present for zookeeper and ambari metrics. images below
ambari versions 2.7.4.0-118
HDF 3.4
python 2.7.5 on ambari server
After logging into Ambari UI, manually rolling restart of all services is successful
I am having trouble finding the cause of the autostart issue
Errors in the ambari-agent.log file show the following which I am not sure what it may mean
ERROR 2020-04-03 13:12:13,827 ComponentVersionReporter.py:91 - Could not get version for component METRICS_MONITOR of AMBARI_METRICS service cluster_id=2. Command returned: {'structuredOut': {}, 'stdout': '2020-04-03 13:12:13,794 - Skipping stack-select on AMBARI_METRICS because it does not exist in the stack-select package structure.', 'stderr': '', 'exitcode': 0}
INFO 2020-04-03 13:12:14,082 security.py:135 - Event to server at /reports/component_version (correlation_id=9): {'clusters': defaultdict(<function <lambda> at 0x7ff5967daf50>, {u'2': [{'componentName': u'ZOOKEEPER_CLIENT', 'serviceName': u'ZOOKEEPER', 'clusterId': u'2', 'version': u'3.4.1.1-4'}]})}
INFO 2020-04-03 13:12:14,085 __init__.py:82 - Event from server at /user/ (correlation_id=9): {u'status': u'OK'}
INFO 2020-04-03 13:12:23,544 security.py:135 - Event to server at /heartbeat (correlation_id=10): {'id': 1}
INFO 2020-04-03 13:12:23,547 __init__.py:82 - Event from server at /user/ (correlation_id=10): {u'status': u'OK', u'id': 0}
ERROR 2020-04-03 13:12:23,548 HeartbeatThread.py:217 - Error in responseId sequence - restarting
INFO 2020-04-03 13:12:23,549 transport.py:358 - Receiver loop ended
Any assistance on how to troubleshoot further would be much appreciated. I can post more logs if needed