Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

connection refused on ambari autostart services - nifi, zookeeper, metrics monitor

connection refused on ambari autostart services - nifi, zookeeper, metrics monitor

New Contributor

I have the following hdf setup

Amabri Server - AWS asg (3 AZ)

Ambari Metrics (Nifi CA, grafana, metrics monitor) single AZ

NiFi - 3 node cluster, 3 AZ

Zookeeper - 3 nodes, 3 AZ

Ambari config in RDS

 

All EC2 servers are started in the morning via aws cli, ambari server has autostart enabled

 

After start, ambari UI says all services are stopped, even though EC2 nodes are all started. errno 111 connection refused error is present for zookeeper and ambari metrics. images below

 

ambari versions 2.7.4.0-118

HDF 3.4

python 2.7.5 on ambari server 

 

nifi-704.PNG

ambari-metrics-704.PNG

zookeeper-704.PNG

 

After logging into Ambari UI, manually rolling restart of all services is successful

 

I am having trouble finding the cause of the autostart issue

 

Errors in the ambari-agent.log file show the following which I am not sure what it may mean

 

ERROR 2020-04-03 13:12:13,827 ComponentVersionReporter.py:91 - Could not get version for component METRICS_MONITOR of AMBARI_METRICS service cluster_id=2. Command returned: {'structuredOut': {}, 'stdout': '2020-04-03 13:12:13,794 - Skipping stack-select on AMBARI_METRICS because it does not exist in the stack-select package structure.', 'stderr': '', 'exitcode': 0}
INFO 2020-04-03 13:12:14,082 security.py:135 - Event to server at /reports/component_version (correlation_id=9): {'clusters': defaultdict(<function <lambda> at 0x7ff5967daf50>, {u'2': [{'componentName': u'ZOOKEEPER_CLIENT', 'serviceName': u'ZOOKEEPER', 'clusterId': u'2', 'version': u'3.4.1.1-4'}]})}
INFO 2020-04-03 13:12:14,085 __init__.py:82 - Event from server at /user/ (correlation_id=9): {u'status': u'OK'}
INFO 2020-04-03 13:12:23,544 security.py:135 - Event to server at /heartbeat (correlation_id=10): {'id': 1}
INFO 2020-04-03 13:12:23,547 __init__.py:82 - Event from server at /user/ (correlation_id=10): {u'status': u'OK', u'id': 0}
ERROR 2020-04-03 13:12:23,548 HeartbeatThread.py:217 - Error in responseId sequence - restarting
INFO 2020-04-03 13:12:23,549 transport.py:358 - Receiver loop ended

 

Any assistance on how to troubleshoot further would be much appreciated. I can post more logs if needed

 

 

Don't have an account?
Coming from Hortonworks? Activate your account here