Support Questions

michaeldurieux2 · ‎03-02-2016

Hi all,

We've installed an HDP cluster (2.3.4.0-3485) on Azure. All services have been implemented including Ambari (2.2.0.0).

We allo "kerberized" the cluster.

Nevertheless, sometimes some alerts appears mentionning a connection failed with a service (random too) talking about credentials (see screenshot).

After a while, they disappear.

Do you know how can I managed them or best, correct this issue ?

I'd really appreciate for your help.

KR,

Michaël

jonathanhurley · ‎03-03-2016

This is a known issue:

https://issues.apache.org/jira/browse/AMBARI-14847

Fixed in Ambari 2.2.2. There are two workarounds, but they are not ideal:

1. You can make the alert thread pool single-threaded

/usr/lib/python2.6/site-packages/ambari_agent/AlertSchedulerHandler.py

https://github.com/apache/ambari/blob/trunk/ambari-agent/src/main/python/ambari_agent/AlertScheduler...

And change the parameters to:

  APS_CONFIG = {
      'apscheduler.threadpool.core_threads': 1,
      'apscheduler.threadpool.max_threads': 1,
      'apscheduler.coalesce': True,
      'apscheduler.standalone': False,
      'apscheduler.misfire_grace_time': 5
    }

2. You can try increasing the timeout period in

/usr/lib/python2.6/site-packages/resource_management/libraries/functions/curl_krb_request.py

Change the -5m to something higher, like -12h

https://github.com/apache/ambari/blob/trunk/ambari-common/src/main/python/resource_management/librar...

This would need to be done on each agent experiencing the problem. Or, just wait for Ambari 2.2.2, which should be out soon.

View solution in original post

aervits · ‎03-02-2016

What version of Java is it

nsabharwal · ‎03-02-2016

@Michael DURIEUX

This is known issue https://community.hortonworks.com/content/kbentry/10464/ambari-alerts-phantom-or-false-alerts-on-ker...

michaeldurieux2 · ‎03-02-2016

Thank you @Neeraj Sabharwal for your quick answer.

We have already seen this topic and applied the solution. We also tried with 5m40 but same errors.

curl_krb_request file

alert_check_oozie_server file

To answer to @Artem Ervits, my Java version is oracle jdk1.8.0_60.

I use Ambari 2.2 and I'll se that I've two folders OOZIE on UNIX: 4.0.0.2.0 and 4.2.0.2.3. The modification has been made on the first one 4.0.0.2.0 because in the second one, I do not have the alert_check_oozie_server.py file. In Ambari, it mentions that my OOZIE version is 4.2.0.2.3.

Any advice ?

Many thanks

jonathanhurley · ‎03-03-2016

This is a known issue:

https://issues.apache.org/jira/browse/AMBARI-14847

Fixed in Ambari 2.2.2. There are two workarounds, but they are not ideal:

1. You can make the alert thread pool single-threaded

/usr/lib/python2.6/site-packages/ambari_agent/AlertSchedulerHandler.py

https://github.com/apache/ambari/blob/trunk/ambari-agent/src/main/python/ambari_agent/AlertScheduler...

And change the parameters to:

  APS_CONFIG = {
      'apscheduler.threadpool.core_threads': 1,
      'apscheduler.threadpool.max_threads': 1,
      'apscheduler.coalesce': True,
      'apscheduler.standalone': False,
      'apscheduler.misfire_grace_time': 5
    }

2. You can try increasing the timeout period in

/usr/lib/python2.6/site-packages/resource_management/libraries/functions/curl_krb_request.py

Change the -5m to something higher, like -12h

https://github.com/apache/ambari/blob/trunk/ambari-common/src/main/python/resource_management/librar...

This would need to be done on each agent experiencing the problem. Or, just wait for Ambari 2.2.2, which should be out soon.

michaeldurieux2 · ‎03-04-2016

Thank you very much for your return.

I think we will wait for Ambari 2.2.2 because, now, we understood the alert so it's not really criticial...

Best regards,

Michaël

aervits · ‎03-04-2016

@Michael DURIEUX please accept the best answer.

Cloudera Community

Support Questions

[Ambari] Critical Random Alerts: connection failed to IP:PORT