07-23-2018 12:48 AM
We have tried to configure Sentry in our hadoop cluster. On HUE interface with “hdfs” user, queries on hive and impala work well.
We have the following configuration:
- master1: namenode + kerberos server + Sentry + Impala Catalog Server + Hue Server + Hive Metastore/Server2
- master2: namenode (standby) + Kerberos secondary server + Impala StateStore
- worker1: datanode + Impala Daemon
- worker2: datanode + Impala Daemon
After this, we want to try with a specific user and we have done the following steps:
On the master server:
#We create the user on OS (Ubuntu 14.04) master1$ adduser dev1 master1$ addgroup sentry_dev master1$ usermod -a -G sentry_dev dev1 #We create the user on Kerberos master1$ sudo kadmin.local kadmin.local: addprinc dev1 kadmin.local: exit
- Create user dev1
- Create group sentry_dev
- Put user dev1 in group sentry_dev
On HUE query editor:
CREATE ROLE dev_rol; GRANT ALL ON DATABASE default TO ROLE dev_rol; GRANT ROLE dev_rol TO GROUP sentry_dev;
After this step, dev1 has access to the database default on HUE, only with HIVE. But for Impala, there is the message "User 'dev1' does not have privileges to access: default.*".
After some research, we found that we need to have the user on each node. So we did this step for each node (master2, worker1, worker2) on the OS:
$ adduser dev1 $ addgroup sentry_dev $ usermod -a -G sentry_dev dev1
Now we have access to Impala tables, but that means we have to create each new user on each OS node manually. Thus the more user we have, the more complicated it could be to manage it, if we have 20 users and 10 nodes etc.
Do you have a better solution? Is there something wrong in our configuration?
07-23-2018 05:36 AM
07-23-2018 10:51 AM
Regarding multiple user creation on multiple nodes, you have to use configuration tools like puppet, chef, ansible, etc
You were asking only about creating a new user in each node, but in real time, your requirement will be extended as follows:
1. Create/modify user at each node
2. Setup temporary password if you don't have sso
3. Create/modify multiple user-groups at each node (admin group, developer group, tester group, analyst, etc)
4. Assign each user to the corresponding user-groups
5. Create a home directory to the each user, setup quota if needed
6. Setup permission & owner to each home directory (as other user should not access)
There are so many other activities we can do with this tool, but i've listed few based on your requirement... hope it will help
07-24-2018 11:11 AM
As mentioned by others, there are some options to ease the management of users and groups.
Common ones are:
1 - SSSD, IPA, Centrify OS level integration so that application calls to the OS are handled by those apps to make queries to a central LDAP source. This requires a good deal of configuration, but it is a robust, enterprise-grade solution
2 - Manage your group and passwd files with automation tools like puppet, chef, etc. (mod once, "push out" changes to all hosts)
3 - Configure LdapGroupsMapping in HDFS so that hadoop services will do group lookups directly to LDAP.
NOTE: If you intend on letting users run jobs directly on YARN, you will still need to create local users on each host with a NodeManager since contains require the os user to be present.