Since you brought up this blog, there are 3 things you need to know. 1. Authentication, 2. User/Group Mapping and 3. Authorization
1. For authentication, there is no alternative for Kerberos. Once your cluster is Kerberized, you can make it easier for certain access path by using AD/LDAP. Example, access to HS2 via AD/LDAP authentication or accessing various services using Knox.
2. Group mapping can be done in 3 ways. One as the blog says, where you lookup AD/LDAP to get the groups for the users. Second is to materialize the AD/LDAP users on the linux server using SSSD, Centrify, etc. Third is to manually create the users and groups in the linux env.All these options are applicable regardless whether you have Kerberos or not.
3. Authorization can be done via Ranger or using the natively supported ACL. Except Storm and Kafka, having Kerberos is not mandatory. Without reliable authentication, authorization and auditing is meaningless.
Common use case as yours: User A logs into the system with his AD credentials, HDFS or Hive ACL's kicks in for authorization.
You have to qualify "system". Which system are you logging in? Only HS2 and Knox allows you to login via AD/LDAP. If you are planning to do that, then you have to setup a very tight firewall around your Hadoop cluster. Absolutely no one should be able to connect to the NameNode, DataNode or any other service port from outside the cluster, except to the JDBC port of HS2 or Knox port. If you can setup this firewall, then all business users will be secure even if don't kerberize your cluster. However, any user who has shell login/port access to edge node/cluster or able to submit a custom job in the cluster will be able to impersonate anyone.
Setting up this firewall is not a trivial thing. Even if you do, there will be users who will need access to the cluster. There should be limited number of such users and these users should be trusted. And you should not let any un-approved job running within the cluster.
If the customer is okay with all the "ifs" and comfortable with limited number of super admin users, then yes you can have security without Keberos.
@firstname.lastname@example.org Is there a workaround to disable HADOOP_USER_NAME feature? Also, I noticed that HADOOP_USER_NAME is not valid all the time. In one of my setups, I have LDAP auth in place for HDFS and HS2 and HADOOP_USER_NAME feature does not work "thankfully"
I will kerberize my cluster and then install Ranger.
I will connect to Active Directory.
As far as I understand, SSSD is deprecated since Windows Server 2012 R2 so, defining lookup on core-site.xml file seems to be the best way out. Is that correct?
However, this change is made on core-site.xml file of HDFS only. How does other applications use the same lookup? Is HDFS the only level where user authorisation must be checked? Does let's say Hive or HBase, start process with authenticated user then leave the authorization to HDFS? How does it work?
In the blog, some other user commented that we must import the cert from the KDC into the default JDK keystore for LDAP.
keytool -importcert -file rootCA.pem -alias kdc -keystore /usr/java/jdk1.8.0_73/jre/lib/security/cacerts
Is this manual step required for Kerberos, AD integrated clusters?
Thanks in advance...