Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Hadoop Security

avatar
Rising Star

Following the security article (https://community.hortonworks.com/articles/17336/choosing-kerberos-approach-for-hadoop-cluster-in-a.html#comment-26641) , there seems to be three different options to enable kerberos for Hadoop cluster. Just wondering which is the recommended approach out of the three from Hortonworks.

1. Use an MIT KDC specific to Hadoop cluster - automated keytab management using Ambari

2. Use an existing Enterprise Active Directory - Manual setup

3. Using existing Enterprise AD with automated management using Ambari

Definitely option 2 seems to be less preferable than 1 and 3. However, wondering what are the factors to consider when choosing either 1 or 3.

1 ACCEPTED SOLUTION

avatar
Master Guru

So it depends: In reality a lot of AD teams will not even consider giving admin access to any outside tool ( even if its resticted to an OU. ). So 2. is definitely used in reality. But yes it is cumbersome.

Regarding 1 and 3:

MIT KDC Has the advantage of not needing to touch the AD system. You can have your own Kerberos instance for all service users and you then simply need to enable a trust from the MIT system to the AD system to have your business users being able to access the cluster too.

Reasons for using an MIT KDC with trust from AD:

- There are often restrictions of putting service users in the corporate AD

- if the corporate AD somehow gets inaccessible or the whole cluster would be stopped.

- You put less stress on the corporate AD for large systems

- For small clusters that are used by a single team or if the cluster is purely automated ( i.e. not directly accessible to lots of business users ) you can use MIT KDC only and create local users for work. No need for AD at all. This is the fastest and most pain free way to setup a kerberized cluster ( using PAM authentication for Hive and local users in hue and ambari ) . However this obviously fails down once you need to give access to a large number of business users.

AD directly also has big advantages:

- Normally your business users are already in the AD and will stay there so you need to create hadoop specific groups anyhow and add your users to them. So the MIT KDC while easier to setup doesn't really provide any real purpose that AD couldn't do on its own.

- The AD team will take care of backup/DR/security and you do not need to worry about that on your MIT KDC.

I think in general if the AD team is pretty flexible and accessible going AD alone is preferable. You would do MIT KDC + AD trust if you expect problems with that and want to have as much control as possible in the hadoop team.

View solution in original post

2 REPLIES 2

avatar

@Greenhorn Techie , There are only 2 options actually.

1). MIT KDC (commonly coupled with AD through a one-way trust)

2).Use of Corporate Active Directory.

This is mainly driven by the customer requirement and their environemnt..If the customer has Active Directory and they want to integrate the Hadoop Cluster with the AD then you would go with Option 2 or 3.

Having a Local MIT KDC fully dedicated to the Hadoop and setting up cross-realm trust with the corporate AD would be the easiest option as nd all the service principals are actually created in the local MIT KDC. Also this option is best-suited for an large cluster where so many ticket request to the AD are actually proxied by KDC itself. however as @Paul Codding mentioned, The real world problem is with the ownership of the local MIT KDC. Should it be owned by hadoop team or Kerberos Team ?

You go down the first route because it puts the least burden on AD, but shifts that burden to the Hadoop Ops/Admin team to own. Option 2 puts all the burden on AD.

avatar
Master Guru

So it depends: In reality a lot of AD teams will not even consider giving admin access to any outside tool ( even if its resticted to an OU. ). So 2. is definitely used in reality. But yes it is cumbersome.

Regarding 1 and 3:

MIT KDC Has the advantage of not needing to touch the AD system. You can have your own Kerberos instance for all service users and you then simply need to enable a trust from the MIT system to the AD system to have your business users being able to access the cluster too.

Reasons for using an MIT KDC with trust from AD:

- There are often restrictions of putting service users in the corporate AD

- if the corporate AD somehow gets inaccessible or the whole cluster would be stopped.

- You put less stress on the corporate AD for large systems

- For small clusters that are used by a single team or if the cluster is purely automated ( i.e. not directly accessible to lots of business users ) you can use MIT KDC only and create local users for work. No need for AD at all. This is the fastest and most pain free way to setup a kerberized cluster ( using PAM authentication for Hive and local users in hue and ambari ) . However this obviously fails down once you need to give access to a large number of business users.

AD directly also has big advantages:

- Normally your business users are already in the AD and will stay there so you need to create hadoop specific groups anyhow and add your users to them. So the MIT KDC while easier to setup doesn't really provide any real purpose that AD couldn't do on its own.

- The AD team will take care of backup/DR/security and you do not need to worry about that on your MIT KDC.

I think in general if the AD team is pretty flexible and accessible going AD alone is preferable. You would do MIT KDC + AD trust if you expect problems with that and want to have as much control as possible in the hadoop team.