Choosing an approach for Kerberos implementation on Hadoop cluster is critical from a long term maintenance point. Enterprises have their own security policies and guidelines and a successful kerberos implementation needs to adhere to enterprise security architecture. There are multiple guides available on how to implement Kerberos but I couldn't find information on which approach to choose and Pros and Cons associated with each approach.
In a Hortonworks Hadoop cluster, there are 3 different ways of generating keytabs and principals and managing them.
a. Use an MIT KDC specific to Hadoop cluster - automated keytab management using Ambari
KDC specific to Hadoop cluster can be installed and maintained on one of the Hadoop nodes. All users/keytabs required for kerberos implementation are automatically managed using Ambari.
Enterprise security teams not involved with KDC setup. Hadoop administrators have complete control of KDC installation.
Automated keytab management using Ambari. No need to manually manage any keytabs during cluster configuration changes or cluster topology changes.
Non-expiring keytabs can be generated for developers and distributed to hadoop developers. Developers can have a copy of keytabs attached to their own id.
One way trust can be set up so enterprise Active Directory can recognize hadoop users.
May be against enterprise security policies.
Hadoop administrators have additional responsibility of managing KDC. Any security vulnerabilities will be responsibility of Hadoop administrators.
Ensuring KDC is setup for high availability and Disaster Recovery is responsibility of Hadoop administrators.
Requires manual keytab generation for any developers. For any new developers, new keytabs need to be generated and distributed by hadoop administrators.
Need to setup procedures for loss of keytabs.
b. Use an existing Enterprise Active Directory - Manual setup
An alternative to having local KDC for hadoop cluster is to manually generate usernames and principals required for kerberos using Ambari and then use corporate AD to create these users.
Meets enterprise security standards by leveraging existing corporate AD infrastructure.
Developers are part of existing AD and no keytabs generations are required for developers.
Manually managing keytabs in a large cluster becomes tedious and difficult to maintain with continous changes to cluster structure.
Any changes in Hadoop cluster structure (add/delete node, add/delete service on new node) require new keytabs to be generated and distributed
c. Using existing Enterprise AD with automated management using Ambari
In this approach a new OU unit is created in enterprise AD and an AD account is created with complete administrative privilege on new OU. This account and OU are then used during automated setup in Ambari. This allows Ambari to automatically manage all keytabs/principal generation and keytab distribution. OU maintains all keytabs and principals for hadoop internal users required for kerberos functionality.
Satisfies corporate security policies. Since complete auditing of users creation/maintenance is available within AD.
All developers and users are part of enterprise AD and a kerberos ticket is already issued to them. Existing tickets are used for any communication with Kerberos cluster.
Backup, High availability and other administrative tasks for KDC are taken care by enterprise AD teams managing AD.
Separate OU within AD ensures hadoop internal users are not mixed with other users in AD.
Any existing Active Directory groups are available in Ranger to implement security policies.
Automated management of all hadoop internal users for keytab generation/distribution.
Changes to cluster topology configuration are handled by Ambari.
Any manual service users ( with non-expiring passwords ) for hadoop cluster need to be added to Active Directory manually and keytab distributed manually. ( May require service requests to generate new id and keytabs to other enterprise groups )
Developers do not have access to keytabs associated with their own ids. Keytabs associated to developer ids are invalidated due to password change policy rules ( Password expiration after certain number of days). Developers can use ticket associated to their id by Active Directory.
Some JAVA applications/tools require copy of keytab files. It may be difficult to find workaround to use cached tickets with these applications/tools.
This is a prelim guide based on my experience with implementing Kerberos. Any other suggestions/ideas are welcome.