Created on 02-23-2018 09:30 PM
How HDFS Apply Ranger Policies
Apache Ranger™ is a framework to enable, monitor and manage comprehensive data security across the Hadoop platform. The vision with Ranger is to provide comprehensive security across the Apache Hadoop ecosystem. With the advent of Apache YARN, the Hadoop platform can now support a true data lake architecture. Enterprises can potentially run multiple workloads, in a multi tenant environment. Data security within Hadoop needs to evolve to support multiple use cases for data access, while also providing a framework for central administration of security policies and monitoring of user access.
Ranger Goals Overview
Apache Ranger has the following goals:
Ranger maintains various type of rule mapping the general layout looks like
1. User -> groups -> policy -> actual Resource(hdfs, hive tables) access/deny/allowed/read/write
2. User -> policy -> actual Resource(hdfs, hive tables) access/deny/allowed/read/write
Key Take away of Ranger
1. Ranger is not an Identity management system, its a service which hold the policy mappings 2. Ranger is least worried about the user name and group names actual relation. 3. You can create a dummy group and attach it to a user, ranger is not bothered if this relationship exsist in LDAP or not 4. Ranger users and groups are snynced from the same LDAP which powers the rest of Hadoop cluster. 5. Its is the common ldap shared between Ranger and Hadoop cluster which enables them to see the same user. 6. No where Ranger claims that it knows all the user present on the cluster, its the job of Ranger user to sync users and groups to Ranger.
The NameNode is the centerpiece of an HDFS file system. It keeps the directory tree of all files in the file system, and tracks where across the cluster the file data is kept. It does not store the data of these files itself. Client applications talk to the NameNode whenever they wish to locate a file, or when they want to add/copy/move/delete a file. The NameNode responds the successful requests by returning a list of relevant DataNode servers where the data lives.
Key Take aways.
1. Namenode is the place where meta info of the file is hdfs maintained.
2. While reading or writing a file, hdfs clients interact with namenode, to get the location of flie blocks on various datanodes and eventually interact with datanodes.
3. All file permission checks happen at namenode for HDFS.
4. Namnode maintains a POSIX style permission user : group : other but also supports fine grained access by applying Hadoop ACLS. Please follow the following link to have a interesting perspective of HDFS compared to Linux ext3.
dfs.namenode.acls.enabled = true enables ACLs on namenode.
6. To know more about hadoop ACLS follow the link.
7. Hadoop POSIX permission are not sufficient to decide all possible permission applicable on a given file of directory.
8. For setting unsettings acls use hdfs dfs -setfacl and hdfs dfs -getfacl
How Namenode and Ranger Interacts
HDFS permission checks happens on hdfs client interaction with Namenode. Namenode has his own ACLs and Ranger policies to apply. The application of permission starts with Ranger ACLs, and then to hadoop ACLs.
How it all works (doAs=true impersonation enabled).
1. Ranger policies are fetched by Namenode and maintained in local cache. Do realize hdfs ranger plugin is not a separate process, but a lib which is executed along with Namenode.
2. User Authenticates to Namenode using one of the specified authenticating mechanism simple, Kerberos.
3. Namenode gets the username during the authentication phase. Do remember even with Kerberos Authentication groups available in the ticket are never used.
4. Based on how core-site.xml if configured Namenode either lookups LDAP to fetch groups of the authenticated user OR it does a lookup from the underlying OS (NSS -> SSSD -> LDAP) to fetch the groups.
5. Once groups are fetched, Namenode has mapping of user to groups of authenticated user.
6. Hdfs Ranger plugin has mapping of user -> groups -> policy, now the groups which were fetched from namenode are used to select the ranger policy and enforce them.
7. Just realize ranger might provide a relation of user 1 -> mapper to 3 groups -> 3 groups mapped to 3 policies. Not all the policies, mapped to the three groups will be applied by default.
8. Namenode will fetch the groups at its own end (LDAP or through OS) and only the overlapping groups with the ranger groups rules will be used while enforcing the policies.
9. Namenode (hdfs ranger plugin lib) will write audit logs locally which is eventually pushed to ranger service (solr).
10. If due to some reason groups are not fetched from Namenode for the authenticated user all the Ranger policies mapped to those groups will not be applied.
11. Sometime mapping user to policies directly help mitigating issues in case LDAP is not working correctly.
12. Do realize all the mapping here are in terms of group names and not gid. As there can be scenario that gid is available on the OS but no groups.
13. IF there are no ranger policies for the user then Hadoop ACLs are applied and appropriate permission is enforced.
hdfs-site.xml Dis.namenode.inode.attributes.provider.class org.apache.ranger.authorization.hadoop.RangerHdfsAuthorizer RangerHdfsAuthorizer => calls checkPermission => which internally calls gets groups of the authenticated user using UserGroupInformation class.
Code Flow :
checkPermission : from userName get groups and check privilidges.
|String user = ugi != null ? ugi.getShortUserName() : null;|
|Set<String> groups = ugi != null ? Sets.newHashSet(ugi.getGroupNames()) : null;|
Get Groups from Username.
Set<String> groups = ugi != null ? Sets.newHashSet(ugi.getGroupNames()) : null;
UserGroupInformation : Core class to authenticate the users and get groups (Kerberos authentication, LDAP, PAM) .
Groups : if nothing is mentioned in core-site.xml then call invoke a shell and get groups for the use.
ShellBasedUnixGroupsMapping: default Implementation