Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

[Ambari] Critical Random Alerts: connection failed to IP:PORT

Solved Go to solution
Highlighted

[Ambari] Critical Random Alerts: connection failed to IP:PORT

New Contributor

Hi all,

We've installed an HDP cluster (2.3.4.0-3485) on Azure. All services have been implemented including Ambari (2.2.0.0).

We allo "kerberized" the cluster.

Nevertheless, sometimes some alerts appears mentionning a connection failed with a service (random too) talking about credentials (see screenshot).

After a while, they disappear.

Do you know how can I managed them or best, correct this issue ?

I'd really appreciate for your help.

KR,

Michaël

2525-alerts.png

2526-yarn-example.png

1 ACCEPTED SOLUTION

Accepted Solutions

Re: [Ambari] Critical Random Alerts: connection failed to IP:PORT

Super Collaborator

This is a known issue:

https://issues.apache.org/jira/browse/AMBARI-14847

Fixed in Ambari 2.2.2. There are two workarounds, but they are not ideal:

1. You can make the alert thread pool single-threaded

/usr/lib/python2.6/site-packages/ambari_agent/AlertSchedulerHandler.py

https://github.com/apache/ambari/blob/trunk/ambari-agent/src/main/python/ambari_agent/AlertScheduler...

And change the parameters to:

  APS_CONFIG = {
      'apscheduler.threadpool.core_threads': 1,
      'apscheduler.threadpool.max_threads': 1,
      'apscheduler.coalesce': True,
      'apscheduler.standalone': False,
      'apscheduler.misfire_grace_time': 5
    }

2. You can try increasing the timeout period in

/usr/lib/python2.6/site-packages/resource_management/libraries/functions/curl_krb_request.py

Change the -5m to something higher, like -12h

https://github.com/apache/ambari/blob/trunk/ambari-common/src/main/python/resource_management/librar...

This would need to be done on each agent experiencing the problem. Or, just wait for Ambari 2.2.2, which should be out soon.

6 REPLIES 6

Re: [Ambari] Critical Random Alerts: connection failed to IP:PORT

Mentor

What version of Java is it

Re: [Ambari] Critical Random Alerts: connection failed to IP:PORT

Re: [Ambari] Critical Random Alerts: connection failed to IP:PORT

New Contributor

Thank you @Neeraj Sabharwal for your quick answer.

We have already seen this topic and applied the solution. We also tried with 5m40 but same errors.

curl_krb_request file

2531-curl-krb-request.jpeg

alert_check_oozie_server file

2532-alert-check-oozie-server.png

To answer to @Artem Ervits, my Java version is oracle jdk1.8.0_60.

I use Ambari 2.2 and I'll se that I've two folders OOZIE on UNIX: 4.0.0.2.0 and 4.2.0.2.3. The modification has been made on the first one 4.0.0.2.0 because in the second one, I do not have the alert_check_oozie_server.py file. In Ambari, it mentions that my OOZIE version is 4.2.0.2.3.

Any advice ?

Many thanks

Re: [Ambari] Critical Random Alerts: connection failed to IP:PORT

Super Collaborator

This is a known issue:

https://issues.apache.org/jira/browse/AMBARI-14847

Fixed in Ambari 2.2.2. There are two workarounds, but they are not ideal:

1. You can make the alert thread pool single-threaded

/usr/lib/python2.6/site-packages/ambari_agent/AlertSchedulerHandler.py

https://github.com/apache/ambari/blob/trunk/ambari-agent/src/main/python/ambari_agent/AlertScheduler...

And change the parameters to:

  APS_CONFIG = {
      'apscheduler.threadpool.core_threads': 1,
      'apscheduler.threadpool.max_threads': 1,
      'apscheduler.coalesce': True,
      'apscheduler.standalone': False,
      'apscheduler.misfire_grace_time': 5
    }

2. You can try increasing the timeout period in

/usr/lib/python2.6/site-packages/resource_management/libraries/functions/curl_krb_request.py

Change the -5m to something higher, like -12h

https://github.com/apache/ambari/blob/trunk/ambari-common/src/main/python/resource_management/librar...

This would need to be done on each agent experiencing the problem. Or, just wait for Ambari 2.2.2, which should be out soon.

Re: [Ambari] Critical Random Alerts: connection failed to IP:PORT

New Contributor

Thank you very much for your return.

I think we will wait for Ambari 2.2.2 because, now, we understood the alert so it's not really criticial...

Best regards,

Michaël

Re: [Ambari] Critical Random Alerts: connection failed to IP:PORT

Mentor

@Michael DURIEUX please accept the best answer.

Don't have an account?
Coming from Hortonworks? Activate your account here