We have multiple hadoop clusters running in production. In each of the clusters, all the Hadoop service user accounts and as well as end applications' service user accounts are prefixed with the cluster id. For example, cluster01 accounts look like hdp01-hdfs, hdp01-yarn etc and cluster02 accounts look like hdp02-hdfs, hdp02-yarn etc.
Now we are in the process of setting up a Disaster Recovery cluster for each of the production clusters. Our clusters are secured with Active Directory and integrated with Centrify. So wondering whats the best strategy in managing security and access privileges such that data accessed on production cluster would have same privileges on the DR cluster as well, given our prefixing of cluster id for each of the service user account on Active Directory.
For example, lets assume that the PROD cluster is hdp01, DR cluster is hdp02 and an application user who needs access to the /data directory on PROD cluster has an AD account hdp01-app-user. We wanted to ensure the user hdp01-app-user account has same access privileges (either managed through HDFS ACLs or Ranger policies) on hdp02 as well.
We explored auth_to_local settings in core-site.xml such that on hdp02, hdp01 users would be mapped to their hdp02 equivalent. For example, RULE: [1:$1@$0] hdp.*-hdfs@AD.DOMAIN.COM) s/.*/hdp02-hdfs
But I’m not sure, this would work seamlessly for all the cases i.e. hadoop users and as well as application users. Would there be any issues with non-hadoop users and groups?
Given the above context, what is the best / recommended approach to handle AD integration with both PROD and DR clusters. Would mirroring hdp01 AD accounts to hdp02 help?