Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

[Ambari] Critical Random Alerts: connection failed to IP:PORT

avatar

Hi all,

We've installed an HDP cluster (2.3.4.0-3485) on Azure. All services have been implemented including Ambari (2.2.0.0).

We allo "kerberized" the cluster.

Nevertheless, sometimes some alerts appears mentionning a connection failed with a service (random too) talking about credentials (see screenshot).

After a while, they disappear.

Do you know how can I managed them or best, correct this issue ?

I'd really appreciate for your help.

KR,

Michaël

2525-alerts.png

2526-yarn-example.png

1 ACCEPTED SOLUTION

avatar
Super Collaborator

This is a known issue:

https://issues.apache.org/jira/browse/AMBARI-14847

Fixed in Ambari 2.2.2. There are two workarounds, but they are not ideal:

1. You can make the alert thread pool single-threaded

/usr/lib/python2.6/site-packages/ambari_agent/AlertSchedulerHandler.py

https://github.com/apache/ambari/blob/trunk/ambari-agent/src/main/python/ambari_agent/AlertScheduler...

And change the parameters to:

  APS_CONFIG = {
      'apscheduler.threadpool.core_threads': 1,
      'apscheduler.threadpool.max_threads': 1,
      'apscheduler.coalesce': True,
      'apscheduler.standalone': False,
      'apscheduler.misfire_grace_time': 5
    }

2. You can try increasing the timeout period in

/usr/lib/python2.6/site-packages/resource_management/libraries/functions/curl_krb_request.py

Change the -5m to something higher, like -12h

https://github.com/apache/ambari/blob/trunk/ambari-common/src/main/python/resource_management/librar...

This would need to be done on each agent experiencing the problem. Or, just wait for Ambari 2.2.2, which should be out soon.

View solution in original post

6 REPLIES 6

avatar
Master Mentor

What version of Java is it

avatar
Master Mentor

avatar

Thank you @Neeraj Sabharwal for your quick answer.

We have already seen this topic and applied the solution. We also tried with 5m40 but same errors.

curl_krb_request file

2531-curl-krb-request.jpeg

alert_check_oozie_server file

2532-alert-check-oozie-server.png

To answer to @Artem Ervits, my Java version is oracle jdk1.8.0_60.

I use Ambari 2.2 and I'll se that I've two folders OOZIE on UNIX: 4.0.0.2.0 and 4.2.0.2.3. The modification has been made on the first one 4.0.0.2.0 because in the second one, I do not have the alert_check_oozie_server.py file. In Ambari, it mentions that my OOZIE version is 4.2.0.2.3.

Any advice ?

Many thanks

avatar
Super Collaborator

This is a known issue:

https://issues.apache.org/jira/browse/AMBARI-14847

Fixed in Ambari 2.2.2. There are two workarounds, but they are not ideal:

1. You can make the alert thread pool single-threaded

/usr/lib/python2.6/site-packages/ambari_agent/AlertSchedulerHandler.py

https://github.com/apache/ambari/blob/trunk/ambari-agent/src/main/python/ambari_agent/AlertScheduler...

And change the parameters to:

  APS_CONFIG = {
      'apscheduler.threadpool.core_threads': 1,
      'apscheduler.threadpool.max_threads': 1,
      'apscheduler.coalesce': True,
      'apscheduler.standalone': False,
      'apscheduler.misfire_grace_time': 5
    }

2. You can try increasing the timeout period in

/usr/lib/python2.6/site-packages/resource_management/libraries/functions/curl_krb_request.py

Change the -5m to something higher, like -12h

https://github.com/apache/ambari/blob/trunk/ambari-common/src/main/python/resource_management/librar...

This would need to be done on each agent experiencing the problem. Or, just wait for Ambari 2.2.2, which should be out soon.

avatar

Thank you very much for your return.

I think we will wait for Ambari 2.2.2 because, now, we understood the alert so it's not really criticial...

Best regards,

Michaël

avatar
Master Mentor

@Michael DURIEUX please accept the best answer.