Created 11-06-2015 07:04 PM
Bringing up couple of FAQ
1) Do we have to use Kerberos? We are ok with AD/LDAP authentication
2) Will Ranger work without Kerberos? Do we need Kerberos for Ranger to secure Ranger?
Created 11-06-2015 11:26 PM
Since you brought up this blog, there are 3 things you need to know. 1. Authentication, 2. User/Group Mapping and 3. Authorization
1. For authentication, there is no alternative for Kerberos. Once your cluster is Kerberized, you can make it easier for certain access path by using AD/LDAP. Example, access to HS2 via AD/LDAP authentication or accessing various services using Knox.
2. Group mapping can be done in 3 ways. One as the blog says, where you lookup AD/LDAP to get the groups for the users. Second is to materialize the AD/LDAP users on the linux server using SSSD, Centrify, etc. Third is to manually create the users and groups in the linux env.All these options are applicable regardless whether you have Kerberos or not.
3. Authorization can be done via Ranger or using the natively supported ACL. Except Storm and Kafka, having Kerberos is not mandatory. Without reliable authentication, authorization and auditing is meaningless.
Common use case as yours: User A logs into the system with his AD credentials, HDFS or Hive ACL's kicks in for authorization.
You have to qualify "system". Which system are you logging in? Only HS2 and Knox allows you to login via AD/LDAP. If you are planning to do that, then you have to setup a very tight firewall around your Hadoop cluster. Absolutely no one should be able to connect to the NameNode, DataNode or any other service port from outside the cluster, except to the JDBC port of HS2 or Knox port. If you can setup this firewall, then all business users will be secure even if don't kerberize your cluster. However, any user who has shell login/port access to edge node/cluster or able to submit a custom job in the cluster will be able to impersonate anyone.
Setting up this firewall is not a trivial thing. Even if you do, there will be users who will need access to the cluster. There should be limited number of such users and these users should be trusted. And you should not let any un-approved job running within the cluster.
If the customer is okay with all the "ifs" and comfortable with limited number of super admin users, then yes you can have security without Keberos.
Created 11-06-2015 07:06 PM
However your Storm plug in will not work. You need Kerberos for Storm plugin.
Created 11-06-2015 07:40 PM
+ @bganesan@hortonworks.com the recommendation is always enable Kerberos/Ranger now. If someone is unwilling to do kerberos show them what happens when you set HADOOP_USER_NAME and I'm sure they will come running
Created 11-06-2015 07:52 PM
@Ali Bajwa @rgarcia@hortonworks.com
This is great discussions. Could you share an example of HADOOP_USER_NAME?
Created 11-13-2015 03:37 PM
Wow did not know this. Thanks.
Created 11-06-2015 07:39 PM
You need Kerberos if you're serious about security. AD/LDAP will cover only a fraction of components, many other systems will require Kerberos for identity. One can still keep users in the LDAP, but the first line in the infrastructure will be Kerberos. (examples: Storm, Kafka, Solr, Spark)
Created 11-06-2015 08:02 PM
@Andrew Grande Could you elaborate more on "fraction of components" ? User A logs into the system with his AD credentials, HDFS or Hive ACL's kicks in for authorization.
I agree with you that Kerberos add more security because all the benefits/features it comes with.
Created 11-06-2015 08:03 PM
There is no security without kerberos. Before anyone goes down that road, just show them this first to make sure they are ok with it
# su yarn $ whoami yarn $ hadoop fs -ls /tmp/hive ls: Permission denied: user=yarn, access=READ_EXECUTE, inode="/tmp/hive":ambari-qa:hdfs:drwx-wx-wx $ export HADOOP_USER_NAME=hdfs $ hadoop fs -ls /tmp/hive Found 3 items drwx------ - ambari-qa hdfs 0 2015-11-04 13:31 /tmp/hive/ambari-qa drwx------ - anonymous hdfs 0 2015-11-04 13:31 /tmp/hive/anonymous drwx------ - hive hdfs 0 2015-11-02 11:15 /tmp/hive/hive
Created 11-06-2015 08:05 PM
Thanks for sharing this.
Is it valid for AD logins?
User A logs into the system with his AD credentials, HDFS or Hive ACL's kicks in for authorization. Is it possible for user A to export HADOOP_USER_NAME=hdfs and take over permissions?
Created 11-06-2015 08:16 PM
Yes I had tried this on cluster where NSLCD was setup so cluster recognizes LDAP users- would be the same for AD/SSSD
sh-4.1$ whoami ali sh-4.1$ hadoop fs -ls /tmp/hive/zeppelin ls: Permission denied: user=ali, access=READ_EXECUTE, inode="/tmp/hive/zeppelin":zeppelin:hdfs:drwx------ sh-4.1$ export HADOOP_USER_NAME=hdfs sh-4.1$ hadoop fs -ls /tmp/hive/zeppelin Found 4 items drwx------ - zeppelin hdfs 0 2015-09-26 17:51 /tmp/hive/zeppelin/037f5062-56ba-4efc-b438-6f349cab51e4