I'm trying to understand and secure my cloudera cluster managed by CM 5.13. I was going with the Local MIT KDC with Active Directory Integration. I have setup a local MIT KDC (server is not kerberized yet) and also a local openLdap server since I dont have a Windows server so using openLDAP instead which will provide the directory services and can be replaced with AD later on.
My question is that in order to establish a cross realm trust between MIT KDC and openldap, do I need to configure a KDC within openLDAP as well (since AD comes with a KDC)? How do I establish a trust relationship with the openLDAP? Any tutorial?
Created 11-14-2018 05:39 PM
OpenLDAP is just fine for hadoop LDAP purposes. Active Directory is part of many existing IT infrastructures, so it is often used due to the way it does combine LDAP and Kerberos (along with other things).
Users in your Kerberos KDC and LDAP server do not necessarily need to originate in the same object.
Any true relationship between the two where the KDC principal exists in an end user object that is used for authentication would exist due to some sort of integration at the KDC / LDAP server level. This is not necessary for hadoop services to work.
In general, there are 3 needs if you are going to secure your cluster with Kerberos:
- Kerberos
- means of mapping users to groups (usually OS shell-based, but can be LDAP based)
- OS users as which services will run and end user OS users for YARN containers (running MR jobs)
If I kinit as bgooley@EXAMPLE.COM and then attempt to perform a listing on a directory that is read for user/group and owned by someone else, then the NameNode must be able to determine if the user is a member of the group who has permission to list files. The principal would be trimmed to a "short name" by trimming off the realm to arrive at bgooley. The user bgooley's group membership would then be determined (shell group mapping or ldap group mapping) . See the following for details:
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/GroupsMapping.html
This mapping is used by several services so it is part of core hadoop.
Then, you have the OS users that must exist at the OS level so that various processes can start as those users and files be owned. Also YARN containers will store information in the OS file system as the user running the job. This means that users who run jobs need to exist on all nodes in the cluster.
Some of these topics are covered in a bit more detail here:
https://www.cloudera.com/documentation/enterprise/latest/topics/sg_auth_overview.html
That's a lot to process, so I'll stop there and wait to see if you have any questions.
Created 11-13-2018 07:43 AM
It is not clear what role the openLDAP server will fulfill. What information are you storing there and how will it be used by the hadoop cluster? OpenLDAP is an LDAP server only so you can't really add a KDC to it. Do you mean that you are using IPA perhaps?
If you are storing your service principals in MIT KDC and your users exist in the MIT KDC, there is no need for cross-realm trust. Cross-realm trust is only required if your hadoop cluster's realm differs from the users' realm.
For example, if you your users existed in Active Directory and authenticate to AD but you want to allow those users access to hadoop, you would need to configure one-way cross-realm trust.
Created 11-13-2018 11:48 AM
So after some reading, I've realized that there is no need for a cross realm trust since there will be no secondary KDC. I'm using openLDAP for centralized user management since there was no AD available. It will store the user accounts and groups. I don't think I can store groups information in KDC hence a directory service i.e. openLDAP.
Now, I'm using openLDAP as the backend for the KDC so that any prinipals added to the realm are stored in the directory followig the post (https://web.mit.edu/kerberos/krb5-latest/doc/admin/conf_ldap.html)
However, I'm still confused about the relation of these users in the directory (created through kadmin) with the normal POSIX users. How would they be integrated so that there is only one entry in the directory?
I'm new to the security side of cluster, and not sure if openLDAP is actually used in such scenarios, since most of the posts mention AD only or what exactly is the industry best practice here. Appreciate your repspone!!
Created 11-14-2018 05:39 PM
OpenLDAP is just fine for hadoop LDAP purposes. Active Directory is part of many existing IT infrastructures, so it is often used due to the way it does combine LDAP and Kerberos (along with other things).
Users in your Kerberos KDC and LDAP server do not necessarily need to originate in the same object.
Any true relationship between the two where the KDC principal exists in an end user object that is used for authentication would exist due to some sort of integration at the KDC / LDAP server level. This is not necessary for hadoop services to work.
In general, there are 3 needs if you are going to secure your cluster with Kerberos:
- Kerberos
- means of mapping users to groups (usually OS shell-based, but can be LDAP based)
- OS users as which services will run and end user OS users for YARN containers (running MR jobs)
If I kinit as bgooley@EXAMPLE.COM and then attempt to perform a listing on a directory that is read for user/group and owned by someone else, then the NameNode must be able to determine if the user is a member of the group who has permission to list files. The principal would be trimmed to a "short name" by trimming off the realm to arrive at bgooley. The user bgooley's group membership would then be determined (shell group mapping or ldap group mapping) . See the following for details:
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/GroupsMapping.html
This mapping is used by several services so it is part of core hadoop.
Then, you have the OS users that must exist at the OS level so that various processes can start as those users and files be owned. Also YARN containers will store information in the OS file system as the user running the job. This means that users who run jobs need to exist on all nodes in the cluster.
Some of these topics are covered in a bit more detail here:
https://www.cloudera.com/documentation/enterprise/latest/topics/sg_auth_overview.html
That's a lot to process, so I'll stop there and wait to see if you have any questions.