Support Questions

Find answers, ask questions, and share your expertise

[Ambari] Critical Random Alerts: connection failed to IP:PORT

New Contributor

Hi all,

We've installed an HDP cluster (2.3.4.0-3485) on Azure. All services have been implemented including Ambari (2.2.0.0).

We allo "kerberized" the cluster.

Nevertheless, sometimes some alerts appears mentionning a connection failed with a service (random too) talking about credentials (see screenshot).

After a while, they disappear.

Do you know how can I managed them or best, correct this issue ?

I'd really appreciate for your help.

KR,

Michaël

2525-alerts.png

2526-yarn-example.png

1 ACCEPTED SOLUTION

Super Collaborator

This is a known issue:

https://issues.apache.org/jira/browse/AMBARI-14847

Fixed in Ambari 2.2.2. There are two workarounds, but they are not ideal:

1. You can make the alert thread pool single-threaded

/usr/lib/python2.6/site-packages/ambari_agent/AlertSchedulerHandler.py

https://github.com/apache/ambari/blob/trunk/ambari-agent/src/main/python/ambari_agent/AlertScheduler...

And change the parameters to:

  APS_CONFIG = {
      'apscheduler.threadpool.core_threads': 1,
      'apscheduler.threadpool.max_threads': 1,
      'apscheduler.coalesce': True,
      'apscheduler.standalone': False,
      'apscheduler.misfire_grace_time': 5
    }

2. You can try increasing the timeout period in

/usr/lib/python2.6/site-packages/resource_management/libraries/functions/curl_krb_request.py

Change the -5m to something higher, like -12h

https://github.com/apache/ambari/blob/trunk/ambari-common/src/main/python/resource_management/librar...

This would need to be done on each agent experiencing the problem. Or, just wait for Ambari 2.2.2, which should be out soon.

View solution in original post

6 REPLIES 6

Mentor

What version of Java is it

New Contributor

Thank you @Neeraj Sabharwal for your quick answer.

We have already seen this topic and applied the solution. We also tried with 5m40 but same errors.

curl_krb_request file

2531-curl-krb-request.jpeg

alert_check_oozie_server file

2532-alert-check-oozie-server.png

To answer to @Artem Ervits, my Java version is oracle jdk1.8.0_60.

I use Ambari 2.2 and I'll se that I've two folders OOZIE on UNIX: 4.0.0.2.0 and 4.2.0.2.3. The modification has been made on the first one 4.0.0.2.0 because in the second one, I do not have the alert_check_oozie_server.py file. In Ambari, it mentions that my OOZIE version is 4.2.0.2.3.

Any advice ?

Many thanks

Super Collaborator

This is a known issue:

https://issues.apache.org/jira/browse/AMBARI-14847

Fixed in Ambari 2.2.2. There are two workarounds, but they are not ideal:

1. You can make the alert thread pool single-threaded

/usr/lib/python2.6/site-packages/ambari_agent/AlertSchedulerHandler.py

https://github.com/apache/ambari/blob/trunk/ambari-agent/src/main/python/ambari_agent/AlertScheduler...

And change the parameters to:

  APS_CONFIG = {
      'apscheduler.threadpool.core_threads': 1,
      'apscheduler.threadpool.max_threads': 1,
      'apscheduler.coalesce': True,
      'apscheduler.standalone': False,
      'apscheduler.misfire_grace_time': 5
    }

2. You can try increasing the timeout period in

/usr/lib/python2.6/site-packages/resource_management/libraries/functions/curl_krb_request.py

Change the -5m to something higher, like -12h

https://github.com/apache/ambari/blob/trunk/ambari-common/src/main/python/resource_management/librar...

This would need to be done on each agent experiencing the problem. Or, just wait for Ambari 2.2.2, which should be out soon.

New Contributor

Thank you very much for your return.

I think we will wait for Ambari 2.2.2 because, now, we understood the alert so it's not really criticial...

Best regards,

Michaël

Mentor

@Michael DURIEUX please accept the best answer.