Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Ambari Server deadlocks

Highlighted

Ambari Server deadlocks

New Contributor

Hello community.

We are currently facing an issue with Ambari Server which causes performance issues, and ends with a JVM crash systematically. Our current production cluster is composed of ten nodes, including most services provided by the Hortonworks Hadoop stack. Performance alert are related to Ambari Server REST API.

We can easilly reproduce it by just creating activities on the web UI by spamming a little bit the interface (manually, with one or two users). Logs display timeout error which after a certain amount of time ends up with a Java OOM. After investigating here is what we found so far :

Database

We use a PostgresSQL database, which in it actual state is still responsible and reactive, we checked some tables such as alert_history (which are approximativly 20k rows) but nothing suspicious. We checked pg_stat_statements table and it appears that there is no slow query at the moment (the higher we could observed only has a 1 seconds average runtime, and even not related to ambari's table).

JVM

We have made 6 thread dumps and one heap dumps after generating activity on UI to make it crash. Following details was detected :

  • 88 threads are present in the JVM
  • ~= 50 threads are in BLOCKED state (waiting for a lock release)
  • Over 25 client threads, 22 are also in BLOCKED state (waiting for a lock release)
  • hprof analysis showed up that 3 client threads own 400Mo of heap memory each
  • 200Mo from a HashMap which holds ResourceImpl as keys, and Object as value.
  • 200Mo from org.eclipse.persistence.internal.sessions.RepeatableWriteUnitOfWork instance

I am currently checking Ambari Server source code through it github repository, matching with thread stack trace using one of the heavy memory consumer thread mentionned earlier as reference :

  • The deadlock occurs in the org.apache.ambari.server.api.query.QueryImpl#queryForResources method
  • While collecting result from query, org.apache.ambari.server.controller.internal.ResourceImpl are inserted into a HashSet
  • Insertion trigger a hashcode computation of ResourceImpl instance, and such hash code is computed with the hash code of a internal synchronized hash map
  • The hash map is the cause for the deadlock, since it is synchronized, it prevents for total I/O access when used concurrently and the hash code computed from such map use iterator which "fails-fast on concurrent modification".

This problem is critical as we need to restart ambari server quite often which prevents for efficency during operations. I am still looking for the root cause but i would gladly appreciate some hints about where to look at :)

1 REPLY 1

Re: Ambari Server deadlocks

@Félix Voituret

Can you please fine tune the Ambari by following https://community.hortonworks.com/articles/80635/optimize-ambari-performance-for-large-clusters.html

Also you did not mentioned which ambari version ru using. If you think there is larger historical data then you can attempt to purge by following https://docs.hortonworks.com/HDPDocuments/Ambari-2.6.2.0/bk_ambari-administration/content/purging-am...

Let me know if any questions.