I'm in a process of securing our 20 hadoop clusters used by hundred of users.
Though I have understood (and tested on a Sandbox) the needs of deploying the components Kerberos-Ranger-knox to get a proper securisation, this solution looks really complex when it's about to manage multiple clusters and hundred of users that are referenced in a LDAP central repository.
Indeed, as far as I have understood, I need to create as many principals as users, generate keytabs files per user and per Hadoop services, and populate thoses files onto the Hadoop clusters. This solution seems very tedious to setup and maintain for the Ops people.
The global architecture could be as follow :
However, ideally, I would like an user to get authenticated with LDAP (as it is currently the case, through SSH (or Knox)), and obtains automatically a Kerberos token and the proper autorisations (provided by Ranger) to use the Hadoop services and access to its HDFS directories.
From the documentation, I don't see such kind of "simple" way of working when dealing with LDAP !?? Did I miss something ?
Is anyone has already setup such kind of infrastructure on Enterprise-grade production clusters ?
I would be glad to get some feedback or any ideas.
Thanks in advance.
To secure services, you would need to kerberize the cluster and set up one-way trust with your AD. Below is the documentation for doing that:
For configuring LDAP/AD authentication for Ambari:
I'm wondering weither it's possible to simplify the architecture by allowing Knox to retrieve a generic Kerberos token as soon as we get authenticated on the Knox gateway with our LDAP user account ?
any clue ?
thanks in advance.