Created on 05-26-2016 04:18 AM - edited 09-16-2022 01:34 AM
We recently had some issues with Ambari and Kafka alerts. It all started when in HDP 2.3.4, every time we changed the kafka listener port from 6667 to any other port, Ambari would complain and give us an error saying that it couldn’t reach port 6667 even though the broker service is actually running in another port. Here’s the exact error:
“Connection failed: [Errno 111] Connection refused to sandbox.hortonworks.com:6667”
This can be quite annoying especially if you have multiple brokers and they’re all reporting CRITICAL and you just can’t seem to get rid of it.
To cut the long story short, here are the steps we did to get rid of the problem. In the section below, we’ll jump into some troubleshooting tips:
1. Get the ID of the kafka broker
> curl -u admin:admin -H 'X-Requested-By: ambari' -X GET "http://localhost:8080/api/v1/clusters/Sandbox/alert_definitions"
2. Get the definitions and save it locally
> curl -u admin:admin -H 'X-Requested-By: ambari' -X GET "http://localhost:8080/api/v1/clusters/Sandbox/alert_definitions/47” > kafka_alerts.json
3. EDIT kafka_alerts.json
{ "AlertDefinition" : { "cluster_name" : "Sandbox", "component_name" : "KAFKA_BROKER", "description" : "This host-level alert is triggered if the Kafka Broker cannot be determined to be up.", "enabled" : true, "id" : 47, "ignore_host" : false, "interval" : 1, "label" : "Kafka Broker Process", "name" : "kafka_broker_process", "scope" : "HOST", "service_name" : "KAFKA", "source" : { "default_port" : 9092, "reporting" : { "critical" : { "value" : 5.0, "text" : "Connection failed: {0} to {1}:{2}" }, "warning" : { "text" : "TCP OK - {0:.3f}s response on port {1}", "value" : 1.5 }, "ok" : { "text" : "TCP OK - {0:.3f}s response on port {1}" } }, "type" : “PORT”, "uri" : "{{kafka-broker/port}}" } }
4. Upload the file
Do this by running this command in the same directory where you saved kafka_alerts.json file:
> curl -u admin:admin -H 'X-Requested-By: ambari' -X PUT "http://localhost:8080/api/v1/clusters/Sandbox/alert_definitions/47" -d @kafka_alerts.json
It can take up to a minute for Ambari to run the metrics again. To speed things up you can force ambari to run the alert check:
> curl -u admin:admin -H 'X-Requested-By: ambari' -X PUT "http://localhost:8080/api/v1/clusters/Sandbox/alert_definitions/47?run_now=true”
This should solve the issue.
Troubleshooting:
If you're still having trouble, these suggestions/tips should help you out.
Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/ambari_agent/AlertSchedulerHandler.py", line 274, in __json_to_callable source = json_definition['source'] TypeError: 'NoneType' object is unsubscriptable
This means that your json is invalid, either because of a number format exception or other reasons. Correlate against the ambari-server.log to find out additional information.
Manually trigger the alert and look for these types logs for validation:
INFO 2016-05-25 16:48:33,355 AlertSchedulerHandler.py:374 - [AlertScheduler] Executing on-demand alert kafka_broker_process (1e0e1edc-e051-45bc-8d38-97ae0b3b83f0)This at least gives you confidence that your alert definition is valid. If instead you get these type of error:
ERROR 2016-05-25 19:40:21,470 AlertSchedulerHandler.py:379 - [AlertScheduler] Unable to execute the alert outside of the job scheduler Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/ambari_agent/AlertSchedulerHandler.py", line 363, in execute_alert alert_definition = execution_command['alertDefinition'] KeyError: 'alertDefinition'
Then you know something's wrong with the alert definition.
If things still don't work, try removing the uri from the alert defintion. This will force ambari to look at the default_port as fall back. Ambari's alert scheduler first looks at the URI, if it's valid, it uses this, if not, it falls back to using default_port. Remember if you remove the uri be sure to remove the comma after "PORT"
Created on 10-04-2016 09:34 AM
After performing the mentioned steps i'm still facing the same issue.
The alert is same:
“Connection failed: [Errno 111] Connection refused to localhost:6667”
curl -u admin:admin -H 'X-Requested-By: ambari' -X GET "http://localhost:8080/api/v1/clusters/XXXX/alert_definitions/17" { "href" : "http://localhost:8080/api/v1/clusters/flumetest/alert_definitions/17", "AlertDefinition" : { "cluster_name" : "XXXX", "component_name" : "KAFKA_BROKER", "description" : "This host-level alert is triggered if the Kafka Broker cannot be determined to be up.", "enabled" : true, "id" : 17, "ignore_host" : false, "interval" : 1, "label" : "Kafka Broker Process", "name" : "kafka_broker_process", "scope" : "HOST", "service_name" : "KAFKA", "source" : { "default_port" : 9092.0, "reporting" : { "critical" : { "value" : 5.0, "text" : "Connection failed: {0} to {1}:{2}" }, "warning" : { "value" : 1.5, "text" : "TCP OK - {0:.3f}s response on port {1}" }, "ok" : { "text" : "TCP OK - {0:.3f}s response on port {1}" } }, "type" : "PORT", "uri" : "{{kafka-broker/listeners}}" } } }
After changing the listener port in config to : 9092
I'm getting:
Connection failed: [Errno 111] Connection refused to localhost:9092