Community Articles

kkanchu · ‎01-10-2018

In large clusters, where there are multiple services which makes use of single Zookeeper quorum, the state store is maintained as znodes. Hence the count of such znodes are directly proportional to the services that are deployed and also the activity on the cluster.

If LLAP apps are deployed in such clusters, it is imperative that slider is enabled (by setting the property, "hadoop.registry.rm.enabled"), this will introduce an overhead in the Znode scans for all the application containers that are created and destroyed on timely basis. The behavior of the scans are as described below,

If the property is set in core-site.xml or yarn-site.xml, the YARN Resource Manager will behave as follows: 1. On startup: create the initial root paths of /, /services and /users. On a secure cluster, access will be restricted to the system accounts (see below). 2. When a user submits a job: create the user path under /users. 3. When a container is completed: delete from the registry all service records with a yarn:persistence field of value container, and a yarn:id field whose value matches the ID of the completed container. 4. When an application attempt is completed: remove all service records with yarn:persistence set to application-attempt and yarn:id set to the pplication attempt ID. 5. When an application finishes: remove all service records with yarn:persistence set to application and yarn:id set to the application ID.

Ref: Registry scan

Hence, this leads to registry scan across all the znodes irrespective of rmservice znode. Meaning, even if there are few thousand (<10K) of applications in /rmstore (/rmstore-secure), the scan would be from root level (/). If the count of znodes under root exceeds 10k limit, this leads to registry scan and hence the connectivity issues between ZK and RM which leads to timeout and hence RM failover and hence its stability. This is addressed in this Apache JIRA.

ROOT CAUSE:

https://issues.apache.org/jira/browse/YARN-6136

RESOLUTION:

To implement change in the ZK scan behavior.

WORKAROUND:

1. If LLAP (slider) is not used:

Disable, hadoop.registry.rm.enabled

2. If LLAP (slider) is used:

i) Assume only LLAP uses slider, if nobody else is using the same ZK cluster, the only way to reduce ZK load is lower yarn.resourcemanager.state-store.max-completed-applications to 3k

ii) If other services use ZK quorum, please reach out to HWX support.

sula678 · ‎04-11-2018

Hello @kkanchu

I am also facing this issue.
As your suggestion, disable hadoop.registry.rm.enabled
But I am curious about the service record, it seems the record cleanup will not take place.
In this case, where can I find the record?

kkanchu · ‎04-11-2018

By "Service record", do you mean the znode in ZK service?

sula678 · ‎04-12-2018

Hello @kkanchu

Thanks for your prompt reply.
Yes, the record I meant was the znode in zookeeper.
When hadoop.registry.rm.enabled was enabled.
I could find many empty folders under /registry/services and /registry/users
If disable hadoop.registry.rm.enabled
Even /registry directory was not generated.

I read this document here, It said

If the property hadoop.registry.rm.enabled is set to false, the RM will not interact with the registry —and the listed operations will not take place. The root paths may be created by other means, but service record cleanup will not take place.

So I am curious if I disable this parameter, and don't handle this record.
Will it has some unknown side effects?
BTW, Slider is not used in my cluster.

kkanchu · ‎04-12-2018

@Neil Tu

By disabling it, znode creation and cleanup will not be performed and for the fact that registry being disabled, the load in parsing the ZK hierarchy is relieved.

sula678 · ‎04-12-2018

Hello @kkanchu

I got it, thank you very much.

Cloudera Community

Community Articles

RM frequent failover

Apache YARN

Apache Zookeeper

Re: RM frequent failover

Re: RM frequent failover

Re: RM frequent failover

Re: RM frequent failover

Re: RM frequent failover