Community Articles

Find and share helpful community-sourced technical articles.
avatar
Rising Star

We recently had some issues with Ambari and Kafka alerts. It all started when in HDP 2.3.4, every time we changed the kafka listener port from 6667 to any other port, Ambari would complain and give us an error saying that it couldn’t reach port 6667 even though the broker service is actually running in another port. Here’s the exact error:

“Connection failed: [Errno 111] Connection refused to sandbox.hortonworks.com:6667”

This can be quite annoying especially if you have multiple brokers and they’re all reporting CRITICAL and you just can’t seem to get rid of it.

To cut the long story short, here are the steps we did to get rid of the problem. In the section below, we’ll jump into some troubleshooting tips:

1. Get the ID of the kafka broker

> curl -u admin:admin -H 'X-Requested-By: ambari' -X GET "http://localhost:8080/api/v1/clusters/Sandbox/alert_definitions"

2. Get the definitions and save it locally

> curl -u admin:admin -H 'X-Requested-By: ambari' -X GET "http://localhost:8080/api/v1/clusters/Sandbox/alert_definitions/47” > kafka_alerts.json

3. EDIT kafka_alerts.json

  • Remove the href line.
  • Change 6667.0 to your new port (e.g. 9092) (Do NOT use decimal or you get a NumberFormatException in the ambari-server.log and no Alerts)
  • The final JSON file should look like this:
{
  "AlertDefinition" : {
    "cluster_name" : "Sandbox",
    "component_name" : "KAFKA_BROKER",
    "description" : "This host-level alert is triggered if the Kafka Broker cannot be determined to be up.",
    "enabled" : true,
    "id" : 47,
    "ignore_host" : false,
    "interval" : 1,
    "label" : "Kafka Broker Process",
    "name" : "kafka_broker_process",
    "scope" : "HOST",
    "service_name" : "KAFKA",
    "source" : {
      "default_port" : 9092,
      "reporting" : {
        "critical" : {
          "value" : 5.0,
          "text" : "Connection failed: {0} to {1}:{2}"
        },
        "warning" : {
          "text" : "TCP OK - {0:.3f}s response on port {1}",
          "value" : 1.5
        },
        "ok" : {
          "text" : "TCP OK - {0:.3f}s response on port {1}"
        }
      },
      "type" : “PORT”,
      "uri" : "{{kafka-broker/port}}"
    }
  }

4. Upload the file

Do this by running this command in the same directory where you saved kafka_alerts.json file:

> curl -u admin:admin -H 'X-Requested-By: ambari' -X PUT "http://localhost:8080/api/v1/clusters/Sandbox/alert_definitions/47" -d @kafka_alerts.json

It can take up to a minute for Ambari to run the metrics again. To speed things up you can force ambari to run the alert check:

> curl -u admin:admin -H 'X-Requested-By: ambari' -X PUT "http://localhost:8080/api/v1/clusters/Sandbox/alert_definitions/47?run_now=true”

This should solve the issue.

Troubleshooting:

If you're still having trouble, these suggestions/tips should help you out.

  1. When uploaded the JSON, make sure the JSON is valid. This is easy to catch as the PUT will return an error that says invalid structure.
  2. Make sure the default_port is an INTEGER when you upload. This is tricky because if you keep the decimal (ex. 6667.0), you won't get an error response, but if you look at /var/log/ambari-server/ambari-server.log, you'll get a number format exception. What's even more tricky is that ambari will start ignoring these metrics all together and you'll end up with this:

    4542-upload.png

  3. Tail the /var/log/ambari-agents/amabri-agents.log file when you run the PUT commands and have a look out for these types of log entries:
    Traceback (most recent call last):
      File "/usr/lib/python2.6/site-packages/ambari_agent/AlertSchedulerHandler.py", line 274, in __json_to_callable
        source = json_definition['source']
    TypeError: 'NoneType' object is unsubscriptable
     

    This means that your json is invalid, either because of a number format exception or other reasons. Correlate against the ambari-server.log to find out additional information.

  4. Manually trigger the alert and look for these types logs for validation:

      INFO 2016-05-25 16:48:33,355 AlertSchedulerHandler.py:374 - [AlertScheduler] Executing on-demand alert kafka_broker_process (1e0e1edc-e051-45bc-8d38-97ae0b3b83f0)
     
    This at least gives you confidence that your alert definition is valid. If instead you get these type of error:
    ERROR 2016-05-25 19:40:21,470 AlertSchedulerHandler.py:379 - [AlertScheduler] Unable to execute the alert outside of the job scheduler
    Traceback (most recent call last):
      File "/usr/lib/python2.6/site-packages/ambari_agent/AlertSchedulerHandler.py", line 363, in execute_alert
        alert_definition = execution_command['alertDefinition']
    KeyError: 'alertDefinition'
     

    Then you know something's wrong with the alert definition.

  5. If things still don't work, try removing the uri from the alert defintion. This will force ambari to look at the default_port as fall back. Ambari's alert scheduler first looks at the URI, if it's valid, it uses this, if not, it falls back to using default_port. Remember if you remove the uri be sure to remove the comma after "PORT"

4,918 Views
Comments
avatar
Contributor

After performing the mentioned steps i'm still facing the same issue.

The alert is same:

“Connection failed: [Errno 111] Connection refused to localhost:6667”
curl -u admin:admin -H 'X-Requested-By: ambari' -X GET "http://localhost:8080/api/v1/clusters/XXXX/alert_definitions/17"
{
  "href" : "http://localhost:8080/api/v1/clusters/flumetest/alert_definitions/17",
  "AlertDefinition" : {
    "cluster_name" : "XXXX",
    "component_name" : "KAFKA_BROKER",
    "description" : "This host-level alert is triggered if the Kafka Broker cannot be determined to be up.",
    "enabled" : true,
    "id" : 17,
    "ignore_host" : false,
    "interval" : 1,
    "label" : "Kafka Broker Process",
    "name" : "kafka_broker_process",
    "scope" : "HOST",
    "service_name" : "KAFKA",
    "source" : {
      "default_port" : 9092.0,
      "reporting" : {
        "critical" : {
          "value" : 5.0,
          "text" : "Connection failed: {0} to {1}:{2}"
        },
        "warning" : {
          "value" : 1.5,
          "text" : "TCP OK - {0:.3f}s response on port {1}"
        },
        "ok" : {
          "text" : "TCP OK - {0:.3f}s response on port {1}"
        }
      },
      "type" : "PORT",
      "uri" : "{{kafka-broker/listeners}}"
    }
  }
}

After changing the listener port in config to : 9092

I'm getting:

Connection failed: [Errno 111] Connection refused to localhost:9092