Created on 01-10-2018 08:14 PM - edited 09-16-2022 01:42 AM
In large clusters, where there are multiple services which makes use of single Zookeeper quorum, the state store is maintained as znodes. Hence the count of such znodes are directly proportional to the services that are deployed and also the activity on the cluster.
If LLAP apps are deployed in such clusters, it is imperative that slider is enabled (by setting the property, "hadoop.registry.rm.enabled"), this will introduce an overhead in the Znode scans for all the application containers that are created and destroyed on timely basis. The behavior of the scans are as described below,
If the property is set in core-site.xml or yarn-site.xml, the YARN Resource Manager will behave as follows: 1. On startup: create the initial root paths of /, /services and /users. On a secure cluster, access will be restricted to the system accounts (see below). 2. When a user submits a job: create the user path under /users. 3. When a container is completed: delete from the registry all service records with a yarn:persistence field of value container, and a yarn:id field whose value matches the ID of the completed container. 4. When an application attempt is completed: remove all service records with yarn:persistence set to application-attempt and yarn:id set to the pplication attempt ID. 5. When an application finishes: remove all service records with yarn:persistence set to application and yarn:id set to the application ID.
Ref: Registry scan
Hence, this leads to registry scan across all the znodes irrespective of rmservice znode. Meaning, even if there are few thousand (<10K) of applications in /rmstore (/rmstore-secure), the scan would be from root level (/). If the count of znodes under root exceeds 10k limit, this leads to registry scan and hence the connectivity issues between ZK and RM which leads to timeout and hence RM failover and hence its stability. This is addressed in this Apache JIRA.
ROOT CAUSE:
https://issues.apache.org/jira/browse/YARN-6136
RESOLUTION:
To implement change in the ZK scan behavior.
WORKAROUND:
1. If LLAP (slider) is not used:
Disable, hadoop.registry.rm.enabled
2. If LLAP (slider) is used:
i) Assume only LLAP uses slider, if nobody else is using the same ZK cluster, the only way to reduce ZK load is lower yarn.resourcemanager.state-store.max-completed-applications to 3k
ii) If other services use ZK quorum, please reach out to HWX support.
Created on 04-11-2018 12:22 PM
Hello @kkanchu
I am also facing this issue.
As your suggestion, disable hadoop.registry.rm.enabled
But I am curious about the service record, it seems the record cleanup will not take place.
In this case, where can I find the record?
Created on 04-11-2018 02:26 PM
By "Service record", do you mean the znode in ZK service?
Created on 04-12-2018 01:49 AM
Hello @kkanchu
Thanks for your prompt reply.
Yes, the record I meant was the znode in zookeeper.
When hadoop.registry.rm.enabled was enabled.
I could find many empty folders under /registry/services and /registry/users
If disable hadoop.registry.rm.enabled
Even /registry directory was not generated.
I read this document here, It said
If the property hadoop.registry.rm.enabled is set to false, the RM will not interact with the registry —and the listed operations will not take place. The root paths may be created by other means, but service record cleanup will not take place.
So I am curious if I disable this parameter, and don't handle this record.
Will it has some unknown side effects?
BTW, Slider is not used in my cluster.
Created on 04-12-2018 06:54 AM
By disabling it, znode creation and cleanup will not be performed and for the fact that registry being disabled, the load in parsing the ZK hierarchy is relieved.
Created on 04-12-2018 09:12 AM
Hello @kkanchu
I got it, thank you very much.