When integrating MIT Kerberos with Active Directory, is it possible for users of the domain you are trusting to be filtered in any way e.g. by OUs ?
or do all the users of the AD domain get access to the machines on the cluster ?
Yes, you can filter by OU, all affected components (Ranger, Ambari, sssd) have a property similar to baseDn in Ambari:
By omitting OU you let all users in.
To address the comment by @Robert Levas: If you just kerberize a cluster using AD then there is no OA filtering since Kerberos is unaware of that. However, if you install, configure and sync Ranger and sssd with AD using LDAP then you can restrict users by OU. sssd is used to restrict users who can login to edge nodes, and using Ranger one can authorize those users to use this and that features of the cluster. Ditto for Ambari which has to be configured and synced with AD separately from Ranger & sssd. Now sssd is just one alternative, others are for example Centrify and Winbind. Another component which can be integrated separately with AD is Knox.
Kerberos is an authentication mechanism, not particularly and authorization mechanism. Granted, if a user cannot be authenticated then that user is typically not authorized to perform tasks on a cluster.
Regarding Active Directory, OUs, and Kerberos... There is no way via the Kerberos infrastructure to limit which users may be authenticated based on what OU in the Active Directory the accounts are under. Maybe there is a way from inside the Active Directory, but that seems unlikely. Keep in mind that Kerberos is an authentication mechanism, not an authorization mechanism.
For authorization, there are some options. Ranger (as @Predrag Minovic ) pointed out is probably your best option for this. I am not familiar with all of the options in Ranger, but I would assume that you can achieve your desired authorization model using it. A more simplistic facility would be to limit which accounts in the Active Directory have local accounts on the Hadoop cluster. As @Predrag Minovic indicated, SSSD maybe able to help out here - though I am not familiar with the options for this tool. In any case, local accounts are needed for HDFS to use for access control of the resources created within it. Related to this is massaging the auth-to-local rules that translate Kerberos principal names to local user account names. By default, a rule is created to simply chop off the realm portion of the principal name resulting in a simple name. For example, "rlevas@EXAMPLE.COM" will be translated to "rlevas". If a local account for "rlevas" existed in the Hadoop cluster than that user will essentially be authorized to access some resources in HDFS. This model does not work for all services, so Ranger would be a better approach.
folks if I understand the correctly the current responses are conflicting? I am very interested in this question. which one is correct? Can you filter by OU?
Both answers are correct.
Robert's answer is concerned with authentication, that is the question: Who am I and how can I prove it? If you have a Kerberos trust from your local MIT Kerberos to AD, it means that you accept any ticket that comes from AD. It's basically like: in the EU, you can prove your identity in the Netherlands, by showing a German (or French) ID.
The second answer deals with authorization, that is the question: What am I allowed to do in a particular environment? This is not a Kerberos question, it is configured in tools like Ambari and Ranger. And here you can use LDAP filters to restrict authorization to specific users (or specific OUs) only.