Support Questions
Find answers, ask questions, and share your expertise

Kerberos, AD/LDAP and Ranger

Bringing up couple of FAQ

1) Do we have to use Kerberos? We are ok with AD/LDAP authentication

2) Will Ranger work without Kerberos? Do we need Kerberos for Ranger to secure Ranger?

1 ACCEPTED SOLUTION

Contributor

Since you brought up this blog, there are 3 things you need to know. 1. Authentication, 2. User/Group Mapping and 3. Authorization

1. For authentication, there is no alternative for Kerberos. Once your cluster is Kerberized, you can make it easier for certain access path by using AD/LDAP. Example, access to HS2 via AD/LDAP authentication or accessing various services using Knox.

2. Group mapping can be done in 3 ways. One as the blog says, where you lookup AD/LDAP to get the groups for the users. Second is to materialize the AD/LDAP users on the linux server using SSSD, Centrify, etc. Third is to manually create the users and groups in the linux env.All these options are applicable regardless whether you have Kerberos or not.

3. Authorization can be done via Ranger or using the natively supported ACL. Except Storm and Kafka, having Kerberos is not mandatory. Without reliable authentication, authorization and auditing is meaningless.

Common use case as yours: User A logs into the system with his AD credentials, HDFS or Hive ACL's kicks in for authorization.

You have to qualify "system". Which system are you logging in? Only HS2 and Knox allows you to login via AD/LDAP. If you are planning to do that, then you have to setup a very tight firewall around your Hadoop cluster. Absolutely no one should be able to connect to the NameNode, DataNode or any other service port from outside the cluster, except to the JDBC port of HS2 or Knox port. If you can setup this firewall, then all business users will be secure even if don't kerberize your cluster. However, any user who has shell login/port access to edge node/cluster or able to submit a custom job in the cluster will be able to impersonate anyone.

Setting up this firewall is not a trivial thing. Even if you do, there will be users who will need access to the cluster. There should be limited number of such users and these users should be trusted. And you should not let any un-approved job running within the cluster.

If the customer is okay with all the "ifs" and comfortable with limited number of super admin users, then yes you can have security without Keberos.

View solution in original post

16 REPLIES 16

  1. No you don't have to use Kerberos. You technically can go with AD/ LDAP authentication. You can do LDAP SSL Authentication also. However why won't you use Kerberos?
  2. Ranger will work without Kerberos

However your Storm plug in will not work. You need Kerberos for Storm plugin.

+ @bganesan@hortonworks.com the recommendation is always enable Kerberos/Ranger now. If someone is unwilling to do kerberos show them what happens when you set HADOOP_USER_NAME and I'm sure they will come running

@Ali Bajwa @rgarcia@hortonworks.com

This is great discussions. Could you share an example of HADOOP_USER_NAME?

Wow did not know this. Thanks.

You need Kerberos if you're serious about security. AD/LDAP will cover only a fraction of components, many other systems will require Kerberos for identity. One can still keep users in the LDAP, but the first line in the infrastructure will be Kerberos. (examples: Storm, Kafka, Solr, Spark)

@Andrew Grande Could you elaborate more on "fraction of components" ? User A logs into the system with his AD credentials, HDFS or Hive ACL's kicks in for authorization.

I agree with you that Kerberos add more security because all the benefits/features it comes with.

There is no security without kerberos. Before anyone goes down that road, just show them this first to make sure they are ok with it

# su yarn
$ whoami
yarn
$ hadoop fs -ls /tmp/hive
ls: Permission denied: user=yarn, access=READ_EXECUTE, inode="/tmp/hive":ambari-qa:hdfs:drwx-wx-wx
$ export HADOOP_USER_NAME=hdfs
$ hadoop fs -ls /tmp/hive
Found 3 items
drwx------   - ambari-qa hdfs          0 2015-11-04 13:31 /tmp/hive/ambari-qa
drwx------   - anonymous hdfs          0 2015-11-04 13:31 /tmp/hive/anonymous
drwx------   - hive      hdfs          0 2015-11-02 11:15 /tmp/hive/hive

@Ali Bajwa

Thanks for sharing this.

Is it valid for AD logins?

User A logs into the system with his AD credentials, HDFS or Hive ACL's kicks in for authorization. Is it possible for user A to export HADOOP_USER_NAME=hdfs and take over permissions?

Yes I had tried this on cluster where NSLCD was setup so cluster recognizes LDAP users- would be the same for AD/SSSD

sh-4.1$ whoami
ali
sh-4.1$ hadoop fs -ls /tmp/hive/zeppelin
ls: Permission denied: user=ali, access=READ_EXECUTE, inode="/tmp/hive/zeppelin":zeppelin:hdfs:drwx------
sh-4.1$ export HADOOP_USER_NAME=hdfs
sh-4.1$ hadoop fs -ls /tmp/hive/zeppelin
Found 4 items
drwx------   - zeppelin hdfs          0 2015-09-26 17:51 /tmp/hive/zeppelin/037f5062-56ba-4efc-b438-6f349cab51e4

So I understand there are two Hadoop environment variables for impersonation:

  • HADOOP_USER_NAME for non-kerberos secured clusters
  • HADOOP_PROXY_USER for clusters secured with Kerberos.

Will the same issue arise with HADOOP_PROXY_USER? @Ali Bajwa @Neeraj

Explorer

thank you for sharing, simple example to demonstrate Kerberos requirement.

@bganesan@hortonworks.com @bdurai@hortonworks.com

Balaji and Bosco,

Do we need to worry about HADOOP_USER_NAME if we enable LDAP mapping for HDFS by following this blog? BLOG

No kerberos in place.

Contributor

Since you brought up this blog, there are 3 things you need to know. 1. Authentication, 2. User/Group Mapping and 3. Authorization

1. For authentication, there is no alternative for Kerberos. Once your cluster is Kerberized, you can make it easier for certain access path by using AD/LDAP. Example, access to HS2 via AD/LDAP authentication or accessing various services using Knox.

2. Group mapping can be done in 3 ways. One as the blog says, where you lookup AD/LDAP to get the groups for the users. Second is to materialize the AD/LDAP users on the linux server using SSSD, Centrify, etc. Third is to manually create the users and groups in the linux env.All these options are applicable regardless whether you have Kerberos or not.

3. Authorization can be done via Ranger or using the natively supported ACL. Except Storm and Kafka, having Kerberos is not mandatory. Without reliable authentication, authorization and auditing is meaningless.

Common use case as yours: User A logs into the system with his AD credentials, HDFS or Hive ACL's kicks in for authorization.

You have to qualify "system". Which system are you logging in? Only HS2 and Knox allows you to login via AD/LDAP. If you are planning to do that, then you have to setup a very tight firewall around your Hadoop cluster. Absolutely no one should be able to connect to the NameNode, DataNode or any other service port from outside the cluster, except to the JDBC port of HS2 or Knox port. If you can setup this firewall, then all business users will be secure even if don't kerberize your cluster. However, any user who has shell login/port access to edge node/cluster or able to submit a custom job in the cluster will be able to impersonate anyone.

Setting up this firewall is not a trivial thing. Even if you do, there will be users who will need access to the cluster. There should be limited number of such users and these users should be trusted. And you should not let any un-approved job running within the cluster.

If the customer is okay with all the "ifs" and comfortable with limited number of super admin users, then yes you can have security without Keberos.

@bdurai@hortonworks.com Is there a workaround to disable HADOOP_USER_NAME feature? Also, I noticed that HADOOP_USER_NAME is not valid all the time. In one of my setups, I have LDAP auth in place for HDFS and HS2 and HADOOP_USER_NAME feature does not work "thankfully"

Rising Star

Hi,

I will kerberize my cluster and then install Ranger.

I will connect to Active Directory.

As far as I understand, SSSD is deprecated since Windows Server 2012 R2 so, defining lookup on core-site.xml file seems to be the best way out. Is that correct?

However, this change is made on core-site.xml file of HDFS only. How does other applications use the same lookup? Is HDFS the only level where user authorisation must be checked? Does let's say Hive or HBase, start process with authenticated user then leave the authorization to HDFS? How does it work?

In the blog, some other user commented that we must import the cert from the KDC into the default JDK keystore for LDAP.

keytool -importcert -file rootCA.pem -alias kdc -keystore /usr/java/jdk1.8.0_73/jre/lib/security/cacerts

Is this manual step required for Kerberos, AD integrated clusters?

Thanks in advance...

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.