05-10-2017 04:43 AM - edited 05-10-2017 04:50 AM
i searched for an answer for one hour and did not find any clear solution for it.
Here my infra:
- Cloudera CDH5.10
- Kerberos against (ADS)
- Hue and CM against Ldap(ADS)
Heres the question:
With the configuration i have right now everything runs nice when i use one of the technical users which exists on both (LDAP and the local OS), but when i try to login as an ldap-only user at hue i can do only my hdfs stuff.
The moment I try to exec impala or a hive/MR ill get the "yarn: user not found" exception. The moment i add the user to all nodes it works fine.
But do i need to sync/create all my ldap users on all my cluster nodes or is it possible to that i can login as ldapuser and hue uses an internal technical account for impala and MR.
And of course : How and where it needs to be configured.
Thanks in advance
05-10-2017 12:06 PM
Thanks for you answer.
But that would mean that we need to sync 100 of users and groups to all of our cluster nodes to provide the fully linage and governance data lake with full fine security...
Isnt there another solution ?
How other enterprise customers solve this i can not imagine that big customers sync hundreds or thousand users to there local os machines?
05-10-2017 02:18 PM
@kaefaetz, You are correct that the management overhead for many users is a problem. One approach might be using automation tool like puppet or chef to update users on each host.
Many administrators who face your issue leverage software that will integrate plugins to handle OS user/group requests via retrieval from Active Directory or other sources.
SSSD, Centrify (costs money), and freeIPA are a few examples of such solutions.
There is some information about this on the following page:
05-18-2017 11:39 AM
Tested bit further. Is there an Easy answer why impala is working with kerberos via hue (count(*)...) and hive is not.
Am i right that it is yarn which requires the local os users?
05-24-2017 12:25 AM
OK impala is not on top of yarn and for yarn applications such as spark or hive every user needs to be on the local os on every node.
But why is impala working without the users with kerberos and sentry activated?