Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Ambari freezes after running fine for a period of time. Shows OOM errors & requires a restart

Solved Go to solution

Ambari freezes after running fine for a period of time. Shows OOM errors & requires a restart

Ambari (2.1.1) is configued to run with 12 GBs of RAM in a small 4 node cluster. It still freezes up after running fine for a period of time and then just hangs.

Here are some of the errors seen in the log files -

ambari-server.log

12 Oct 2015 13:04:56,303 ERROR [qtp-client-18920] MetricsPropertyProvider:183 - Error getting timeline metrics. Can not connect to collector, socket error.

12 Oct 2015 13:19:16,643 ERROR [qtp-client-18897] MetricsPropertyProvider:183 - Error getting timeline metrics. Can not connect to collector, socket error.

12 Oct 2015 16:02:46,153  WARN [qtp-client-19308] nio:726 - handle failed

13 Oct 2015 02:19:55,555  WARN [Timer-0] ThreadPoolAsynchronousRunner:608 - com.mchange.v2.async.ThreadPoolAsynchronousRunner$DeadlockDetector@4214238c -- APPARENT DEADLOCK!!! Creating emergency threads for unassigned pending tasks!

ambari-server.out

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "alert-event-bus-3175"
Exception in thread "alert-event-bus-3179" 
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "alert-event-bus-3179"
Exception in thread "alert-event-bus-3178" 
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "alert-event-bus-3178"
Exception in thread "alert-event-bus-3180" 
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "alert-event-bus-3180"
Exception in thread "alert-event-bus-3181" 
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "alert-event-bus-3181"
Exception in thread "alert-event-bus-3182" 
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "alert-event-bus-3182"
Exception in thread "alert-event-bus-3183" Exception in thread "alert-event-bus-3184" 
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "alert-event-bus-3183"

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "alert-event-bus-3184"

Thread Dump shows threads in following states (count - state)

   3    java.lang.Thread.State: BLOCKED (on object monitor)
  18    java.lang.Thread.State: RUNNABLE
   9    java.lang.Thread.State: TIMED_WAITING (on object monitor)
  15    java.lang.Thread.State: TIMED_WAITING (parking)
   4    java.lang.Thread.State: WAITING (on object monitor)
  10    java.lang.Thread.State: WAITING (parking)

An impetus consultant working on the effort notice that there were too many open connections to postgres DB.

Any ideas are appreciated.

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Ambari freezes after running fine for a period of time. Shows OOM errors & requires a restart

Cloudera Employee

If you are seeing deadlock on Ambari 2.1.2 , it would be due to Ambari Views Instance creation.

https://hortonworks.jira.com/browse/EAR-2415

6 REPLIES 6

Re: Ambari freezes after running fine for a period of time. Shows OOM errors & requires a restart

I have seen a similar issue a couple weeks ago. Ambari was running fine, but after some time I had to restart ambari-server because Ambari Metrics was spamming the ambari log.

If you have Ambari Metrics installed and enabled, could you please stop the service, restart ambari server and see if the problem still occurs? Also make sure you Ambari Metrics Service is configured correctly, especially the heap (usually too low by default). Check this link for heap tuning https://cwiki.apache.org/confluence/display/AMBARI/Configurations+-+Tuning

Highlighted

Re: Ambari freezes after running fine for a period of time. Shows OOM errors & requires a restart

You need to increase the memory settings for Ambari. I ran into this a while back with certain views.

I added/adjusted the following in:

/var/lib/ambari-server/ambari-env.sh

For "AMBARI_JVM_ARGS"

-Xmx4G -XX:MaxPermSize=512m

Re: Ambari freezes after running fine for a period of time. Shows OOM errors & requires a restart

FWIW, PermGen space has been removed from Java 8, this last param will generate a warning.

Re: Ambari freezes after running fine for a period of time. Shows OOM errors & requires a restart

Guru

Using views requires increasing both the Xmx and MaxPermSize, documentation mentioning that is located here: http://docs.hortonworks.com/HDPDocuments/Ambari-2.1.2.0/bk_ambari_views_guide/content/ch_using_ambar.... If you hit this dead lock again please capture the jstack output for the Ambari Server process and work with support to see what the issue is.

Re: Ambari freezes after running fine for a period of time. Shows OOM errors & requires a restart

Cloudera Employee

If you are seeing deadlock on Ambari 2.1.2 , it would be due to Ambari Views Instance creation.

https://hortonworks.jira.com/browse/EAR-2415

Re: Ambari freezes after running fine for a period of time. Shows OOM errors & requires a restart

New Contributor

A. Run Ambari Metrics in Distributed Mode rather than embedded If you are running with more than 3 nodes, I strongly suggest running in distributed mode and writing hbase.root.dir contents to hdfs directly, rather than to the local disk of a single node. This applies to already installed and running IOP clusters.

  1. In the Ambari Web UI, select the Ambari Metrics service and navigate to Configs. Update the following properties:
    • General > Metrics Service operation mode=distributed ams_performance_tuning_A11
    • Advanced ams-hbase-site > hbase.cluster.distributed=true ams_performance_tuning_A13
    • Advanced ams-hbase-site > hbase.root.dir=hdfs://namenode.fqdn.example.org:8020/amshbase ams_performance_tuning_A12
  2. Restart Metrics Collector and affected Metrics monitors