Created 11-06-2015 07:04 PM
Bringing up couple of FAQ
1) Do we have to use Kerberos? We are ok with AD/LDAP authentication
2) Will Ranger work without Kerberos? Do we need Kerberos for Ranger to secure Ranger?
Created 11-06-2015 11:26 PM
Since you brought up this blog, there are 3 things you need to know. 1. Authentication, 2. User/Group Mapping and 3. Authorization
1. For authentication, there is no alternative for Kerberos. Once your cluster is Kerberized, you can make it easier for certain access path by using AD/LDAP. Example, access to HS2 via AD/LDAP authentication or accessing various services using Knox.
2. Group mapping can be done in 3 ways. One as the blog says, where you lookup AD/LDAP to get the groups for the users. Second is to materialize the AD/LDAP users on the linux server using SSSD, Centrify, etc. Third is to manually create the users and groups in the linux env.All these options are applicable regardless whether you have Kerberos or not.
3. Authorization can be done via Ranger or using the natively supported ACL. Except Storm and Kafka, having Kerberos is not mandatory. Without reliable authentication, authorization and auditing is meaningless.
Common use case as yours: User A logs into the system with his AD credentials, HDFS or Hive ACL's kicks in for authorization.
You have to qualify "system". Which system are you logging in? Only HS2 and Knox allows you to login via AD/LDAP. If you are planning to do that, then you have to setup a very tight firewall around your Hadoop cluster. Absolutely no one should be able to connect to the NameNode, DataNode or any other service port from outside the cluster, except to the JDBC port of HS2 or Knox port. If you can setup this firewall, then all business users will be secure even if don't kerberize your cluster. However, any user who has shell login/port access to edge node/cluster or able to submit a custom job in the cluster will be able to impersonate anyone.
Setting up this firewall is not a trivial thing. Even if you do, there will be users who will need access to the cluster. There should be limited number of such users and these users should be trusted. And you should not let any un-approved job running within the cluster.
If the customer is okay with all the "ifs" and comfortable with limited number of super admin users, then yes you can have security without Keberos.
Created 11-06-2015 07:06 PM
However your Storm plug in will not work. You need Kerberos for Storm plugin.
Created 11-06-2015 07:40 PM
+ @bganesan@hortonworks.com the recommendation is always enable Kerberos/Ranger now. If someone is unwilling to do kerberos show them what happens when you set HADOOP_USER_NAME and I'm sure they will come running
Created 11-06-2015 07:52 PM
@Ali Bajwa @rgarcia@hortonworks.com
This is great discussions. Could you share an example of HADOOP_USER_NAME?
Created 11-13-2015 03:37 PM
Wow did not know this. Thanks.
Created 11-06-2015 07:39 PM
You need Kerberos if you're serious about security. AD/LDAP will cover only a fraction of components, many other systems will require Kerberos for identity. One can still keep users in the LDAP, but the first line in the infrastructure will be Kerberos. (examples: Storm, Kafka, Solr, Spark)
Created 11-06-2015 08:02 PM
@Andrew Grande Could you elaborate more on "fraction of components" ? User A logs into the system with his AD credentials, HDFS or Hive ACL's kicks in for authorization.
I agree with you that Kerberos add more security because all the benefits/features it comes with.
Created 11-06-2015 08:03 PM
There is no security without kerberos. Before anyone goes down that road, just show them this first to make sure they are ok with it
# su yarn $ whoami yarn $ hadoop fs -ls /tmp/hive ls: Permission denied: user=yarn, access=READ_EXECUTE, inode="/tmp/hive":ambari-qa:hdfs:drwx-wx-wx $ export HADOOP_USER_NAME=hdfs $ hadoop fs -ls /tmp/hive Found 3 items drwx------ - ambari-qa hdfs 0 2015-11-04 13:31 /tmp/hive/ambari-qa drwx------ - anonymous hdfs 0 2015-11-04 13:31 /tmp/hive/anonymous drwx------ - hive hdfs 0 2015-11-02 11:15 /tmp/hive/hive
Created 11-06-2015 08:05 PM
Thanks for sharing this.
Is it valid for AD logins?
User A logs into the system with his AD credentials, HDFS or Hive ACL's kicks in for authorization. Is it possible for user A to export HADOOP_USER_NAME=hdfs and take over permissions?
Created 11-06-2015 08:16 PM
Yes I had tried this on cluster where NSLCD was setup so cluster recognizes LDAP users- would be the same for AD/SSSD
sh-4.1$ whoami ali sh-4.1$ hadoop fs -ls /tmp/hive/zeppelin ls: Permission denied: user=ali, access=READ_EXECUTE, inode="/tmp/hive/zeppelin":zeppelin:hdfs:drwx------ sh-4.1$ export HADOOP_USER_NAME=hdfs sh-4.1$ hadoop fs -ls /tmp/hive/zeppelin Found 4 items drwx------ - zeppelin hdfs 0 2015-09-26 17:51 /tmp/hive/zeppelin/037f5062-56ba-4efc-b438-6f349cab51e4
Created 11-06-2015 09:09 PM
Do you have LDAP enabled for HDFS?
http://hortonworks.com/blog/hadoop-groupmapping-ldap-integration/
Created 11-13-2015 03:30 PM
So I understand there are two Hadoop environment variables for impersonation:
HADOOP_USER_NAME
for non-kerberos secured clustersHADOOP_PROXY_USER
for clusters secured with Kerberos.Will the same issue arise with HADOOP_PROXY_USER? @Ali Bajwa @Neeraj
Created 11-07-2015 05:52 AM
thank you for sharing, simple example to demonstrate Kerberos requirement.
Created 11-06-2015 09:17 PM
@bganesan@hortonworks.com @bdurai@hortonworks.com
Balaji and Bosco,
Do we need to worry about HADOOP_USER_NAME if we enable LDAP mapping for HDFS by following this blog? BLOG
No kerberos in place.
Created 11-06-2015 11:26 PM
Since you brought up this blog, there are 3 things you need to know. 1. Authentication, 2. User/Group Mapping and 3. Authorization
1. For authentication, there is no alternative for Kerberos. Once your cluster is Kerberized, you can make it easier for certain access path by using AD/LDAP. Example, access to HS2 via AD/LDAP authentication or accessing various services using Knox.
2. Group mapping can be done in 3 ways. One as the blog says, where you lookup AD/LDAP to get the groups for the users. Second is to materialize the AD/LDAP users on the linux server using SSSD, Centrify, etc. Third is to manually create the users and groups in the linux env.All these options are applicable regardless whether you have Kerberos or not.
3. Authorization can be done via Ranger or using the natively supported ACL. Except Storm and Kafka, having Kerberos is not mandatory. Without reliable authentication, authorization and auditing is meaningless.
Common use case as yours: User A logs into the system with his AD credentials, HDFS or Hive ACL's kicks in for authorization.
You have to qualify "system". Which system are you logging in? Only HS2 and Knox allows you to login via AD/LDAP. If you are planning to do that, then you have to setup a very tight firewall around your Hadoop cluster. Absolutely no one should be able to connect to the NameNode, DataNode or any other service port from outside the cluster, except to the JDBC port of HS2 or Knox port. If you can setup this firewall, then all business users will be secure even if don't kerberize your cluster. However, any user who has shell login/port access to edge node/cluster or able to submit a custom job in the cluster will be able to impersonate anyone.
Setting up this firewall is not a trivial thing. Even if you do, there will be users who will need access to the cluster. There should be limited number of such users and these users should be trusted. And you should not let any un-approved job running within the cluster.
If the customer is okay with all the "ifs" and comfortable with limited number of super admin users, then yes you can have security without Keberos.
Created 11-07-2015 11:12 AM
@bdurai@hortonworks.com Is there a workaround to disable HADOOP_USER_NAME feature? Also, I noticed that HADOOP_USER_NAME is not valid all the time. In one of my setups, I have LDAP auth in place for HDFS and HS2 and HADOOP_USER_NAME feature does not work "thankfully"
Created 03-14-2017 08:26 AM
Hi,
I will kerberize my cluster and then install Ranger.
I will connect to Active Directory.
As far as I understand, SSSD is deprecated since Windows Server 2012 R2 so, defining lookup on core-site.xml file seems to be the best way out. Is that correct?
However, this change is made on core-site.xml file of HDFS only. How does other applications use the same lookup? Is HDFS the only level where user authorisation must be checked? Does let's say Hive or HBase, start process with authenticated user then leave the authorization to HDFS? How does it work?
In the blog, some other user commented that we must import the cert from the KDC into the default JDK keystore for LDAP.
keytool -importcert -file rootCA.pem -alias kdc -keystore /usr/java/jdk1.8.0_73/jre/lib/security/cacerts
Is this manual step required for Kerberos, AD integrated clusters?
Thanks in advance...