Created on 12-08-201501:07 AM - edited 09-16-202201:33 AM
Q: What are the use cases for Centrify?
A: Integrate AD with Linux and there are no local user deployment similar to how SSSD is configured. The size of the cluster and/or domains is big enough that it's hard to manage with SSSD. Centrify greatly simplifies the management of this type of environment. Centrify ldapproxy also abstracts the complexity around integrating with other filers/appliances like Isilon, NetApp, etc. ldapproxy fronts Hadoop and doesn't expose internals of AD but only give information about a zone. It is usually used for machine to machine type authentication.
Q: If there are multiple domains in a forest, how does Centrify know which domain controller to use to authenticate a user?
A: Centrify walk the forest tree and figure out what domain controller to use to authenticate the user. It utilizes Domain Controller Service to perform this action. It doesn't use krb5.conf file. The Centrify agent knows what forest or domain controller it belongs to. It is PAM and site aware and its base authentication mechanism is Kerberos. It builds an index of Domain Controllers and DNS Servers and tag them based on the response time. Based on this information, the agent will know if a particular DC or DNS server has issues and will not use them. DNS must be setup properly , including reverse lookup for Centrify to work. The agents supports authenticating the same user that exists across different domains.
Q: What happens when Centrify agents fail?
A: Centrify does not store AD information and there's no such thing as policy server. It completely leverages the AD infrastructure to scale out. Centrify DirectControl (CDC) watches all Centrify agents and restarts them when they fail. See below diagram for reference. If all else fails, Centrify can fallback to NTLM if it needs to. For example some users don't have Kerberos enabled on their laptops due to inherent issues and has to resort to NTLM.
Q: What's the best practice for laying out the Centrify policies?
A: The basic building block of Centrify policies is zone. A zone is how Centrify organizes the data inside of AD. Zone is a unit of cluster. The data is essentially user information, unix group information, unix computer information, role-based access control and many more. The reason why this is done is because of service connection point. Service Connection Point is a multi-diag object that's been available since Windows 2003 and Centrify link that back to the real AD object. This provides flexibility on naming conventions for zones, what objects to link it to in AD.
The Service Connection point can be seen from "Active Directory Users and Computers" window as shown below. These service points are what PAM will use to authenticate users.p
The image below describes best practice layout of creating policies in Centrify.
All users are defined under Zones->UNIX Data->Users. Remember, all users and groups are created in AD. What shows up here are just pointers to the AD user objects. These users will eventually be inherited to the Child Zones. The Hadoop cluster is the boundary for Centrify policies. No Hadoop node should belong to multiple zones. The only exception here is when an RDBMS is used for Hadoop components that would need it i.e. Ambari, Oozie, Hive. Centrify agents supports multiple domains where same user exists across domains. Hadoop jobs pick up the real AD user.
It is best to name the child zones in lower case and must match the Hadoop cluster name. In the sample policy above, "smesecurity" is the name of the child zone and it's also the name of the Hadoop cluster with case matching. Only the nodes within this cluster should exist in Zones->Global->Child Zones->Computers.
The global users are not automatically pushed down to child zones. It has to be explicitly added. For users to successfully login to linux machines, they have to have a complete profile - UID, GID, and a Role Assignment. Role Assignment grants the access. There will be users that exists on the child zones that don't exist on the parent zone. These are normally the service accounts that lives only on the child zone. The OU structure has to lineup with how the zone is structured. This is the best practice.
It is possible to redefine the same user in the child zone with different properties basically overriding what's defined globally for that user.
For large cluster installations, it's easier to use VPA, part of DirectControl component, that automates the creation of user profile in Centrify by just dropping the user into AD Groups. This is done through PowerShell or Linux/Unix command interface.
All the policy information entered in Centrify are stored in AD. See below. The green box shows everything that was defined in Centrify including the "smesecurity" child zone. This is also replicated across active directories for redundancy purposes.
Q: Centrify creates service principals for nfs and http. Will this create issues with Kerberizing HDP?
A: Yes. Centrify has its own Kerberos module for nfs and http. When Kerberizing clusters with Ambari, it automatically generates principals for nfs and http services and this clashes with Centrify. To prevent issues, update the file /etc/centrifydc/centrifydc.conf on all machines and look for the property adclient.krb5.service.principals. Remove "nfs" and "http" entries. It should look like this.
adclient.krb5.service.principals: ftp cifs
If for some reason, the nfs and http entries were not removed and Kerberos wizard in Ambari was run, NFS Gateway, DataNode and other components that depends on http will fail. To resolve this, update all the centrifydc.com and remove nfs and http as described above. Also remove the http and nfs SPNs from AD. Then on all machines, run the following commands.
# service centrifydc restart
Q: Centrify ldapproxy won't start using TLS. Certificate cannot be found.
A: Common issues with ldapproxy not starting up successfully is normally caused by certificate names and casing not matching between AD and Centrify. Check the certificates in /var/centrify/net/certs/ if the certificate names matches. Make sure that the file vi /etc/centrifydc/openldap/slapd.conf has entries for the centrify certificate. See sample below.
# Centrify specific
Q: How does Centrify computer roles play into Hadoop clusters?
A: Computer roles allows you to define a set of rights to a logical group of computers. Ambari, Oozie and Hive Metastore all uses RDBMS systems and there's growing trend that organizations prefer to use Oracle and SQL Server. For example an Oracle Admin and Oracle Server(s) are defined in computer roles and the admin rights are applied to these servers regardless of location. These servers can be used by multiple Hadoop clusters. The provisioning of computer role assignments can be done at the zone level or at the node level. There's this concept of delegating zone control from within a zone, computers and users, that can be used to specify what group have admin rights to it (not root rights but AD rights - see image below).
Q: When a new AD is added to the forest, how does Centrify pick it up?
A: There are configurations that allows Centrify agents to automatically walk the tree of AD domains and discover new AD servers within the forest. The discovery process is time based and can be changed. The agents also keeps track of what AD controller is up or down. There are PTR records in the AD DNS Manager as shown below that is used by Centrify agents to discover Domain Controllers and Global Catalog servers.
Q: Linux servers have their own DNS services and AD has its own built-in directory services. It's a painful process to point the Linux servers to AD and build PTR records for them. How does Centrify make this more seamless?
A: Centrify supports integrating with two different DNS environments (i.e. hortonworks.net and hortonworks.com) through a feature called "alias". Though possible and supported, it is not recommended to setup Centrify and Hadoop to deal with this type of configuration.
Q: What's the behavior of Centrify when a user logs in to machines using ssh?
A: If the user provided a password to login, kerberos ticket will be automatically generated. If ssh key is used, it will not automatically generate the ticket. User has to kinit. When forwardable tickets are turned on in windows kerberos systems, the user does not have to kinit again.
Q: How does Centrify sync with latest AD changes?
A: Centrify has a utility called adflush to pull down the changes from AD. It could be an expensive process depending on what information is being pulled down. adflush will be a perfect tool for developers in POC mode.
Q: How can I blacklist users in Centrify?
A: You can enter the users that you want to block in this file /etc/centrifydc/users.ignore.
Q: How do you safely snapshot Centrify?
A: If you snapshot machines with Centrify agents and roll back to the latest version and the keytab file changed, the machines won't be able to authenticate with AD. Make sure that when snapshots are running that keytabs are the same when rolling back to a specific version.
Q: With a very large cluster (in the thousands of nodes), how do you scale with Centrify and AD?
A: It is recommended to deploy the Domain Controller in the same rack space as the Hadoop nodes. You want your AD to be replicated. Hadoop will hammer AD with requests and you want to make sure that AD can handle it. Centrify is agent based so no issues with scaling. The agents know which domain controllers to go to and which one they can connect to faster.