Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Part-1 : Authorization on production cluster

avatar
Super Collaborator

Freshly installed HDP 2.4 using Ambari 2.2.2.0 over RHEL7 machines.

I have tried to depict the usage scenario in a hand-drawn diagram, please bear with it 🙂

5202-usage-scenarios.png

Description :

  1. The authentication i.e the log-in to the Linux machines where the cluster components exist is via some AD-like service
  2. Several roles exist - a Data scientist would load some data and write Pig scripts, ETL guy would import RDBMS schema onto Hive and Ambari admin would start-stop the Ambari server and so on
  3. Several users pertaining to one or more roles can exist, all the users will have a Linux account in the AD in case they wish to log-in via the CLI e.g: Putty. So a Data Scientist would log-on some node using Putty, then load some data using 'hdfs dfs -copyFromLocal' and then execute some pig scripts but he should not be able to CRUD(even see) the directories/data belonging to the ETL Expert or a two Hive users can't see each other's schemas and so on
  4. Since everyone uses a browser, people can access the NN, RM, Job History UI via their Windows/Mac/Linux workstations and will be valid domain users. It's crucial that only 'authorized' people can browse the file system and check the job status, logs and so on e.g: to NO one can just browse the file system without any authentication and authorization

Questions/Confusions :

  1. I read several documents - Hadoop in secure mode, HDFS Permissions Guide, HDP's Ranger approach but given a fresh cluster with default settings, I'm unsure do all of these are required or merely Ranger suffices and HOW to begin
  2. Ideally, alike the Linux /home/<username> dir., each user should have his/her own hdfs user space and he/she is restricted to that - can't even read anything outside that
  3. Given the existing AD-like systems, I am unsure if the Hadoop Kerberos authentication is required but I think that the Access Control Lists on HDFS would be required but I don't know how to start here
  4. The users and roles will be expanding so it should be easy and quick to add/remove/modify/delete users and roles that will be using the Hadoop ecosystem
  5. Probably, a naive question - if Ambari/Ambari + Ranger/Ambari + Ranger + Knox is used, is it necessary to do anything at the Linux level ? Is it necessary to go the the hdfs user on CLI and play with ACLs and so on ?
1 ACCEPTED SOLUTION

avatar
Guru

1. Ranger takes care of authorization. You will need something for authentication which is where kerberos and AD can come up.

2. You can set a /user/<username> in hdfs which is a user home directory. You might still need common hdfs directories where collaboration happens.

3. If you have AD, it will have kerberos. If you have write access to an OU in AD, you can create all service level principals there. So, no separate kerberos/KDC will be required. But if you don't want to create service level principals on AD, you can have local kerberos/KDC and have a one way trust with AD.

4. If you enable group based authorizations, adding users could be as easy adding user to the right group and creating a home directory for the user.

5. Ranger can take care of most authorizations and you can avoid working with ACLs.

View solution in original post

3 REPLIES 3

avatar
Guru

1. Ranger takes care of authorization. You will need something for authentication which is where kerberos and AD can come up.

2. You can set a /user/<username> in hdfs which is a user home directory. You might still need common hdfs directories where collaboration happens.

3. If you have AD, it will have kerberos. If you have write access to an OU in AD, you can create all service level principals there. So, no separate kerberos/KDC will be required. But if you don't want to create service level principals on AD, you can have local kerberos/KDC and have a one way trust with AD.

4. If you enable group based authorizations, adding users could be as easy adding user to the right group and creating a home directory for the user.

5. Ranger can take care of most authorizations and you can avoid working with ACLs.

avatar
Super Collaborator

@Ravi Mutyala

Can you elaborate and help me understand :

  • You can set a /user/<username> in hdfs which is a user home directory. You might still need common hdfs directories where collaboration happens

Does this mean that every time a new user is to be added, someone has to log-in as 'hdfs' on cli and create a hdfs dir. /user/<username> and then change the ownership of that dir. ?

  • If you have write access to an OU in AD, you can create all service level principals there

An OU can be created but what is 'service level principal' - is it creating groups(or users?) like hadoop, hdfs, hive, yarn,sqoop etc. in that OU manually ? The biggest concern I have here is that during cluster installation, under Misc, the 'Skip group modifications during install' was left unchecked so the users and groups were created locally, now is it reqd. to change it(how to do that in Ambari) and if yes, will the cluster function properly? Can you provide a documentation link ?

  • If you enable group based authorizations, adding users could be as easy adding user to the right group and creating a home directory for the user

Unsure if I understood, I believe, the addition of users to a group has to be done at both Linux and HDFS levels, this will still involve creating /user/<username> dir. on HDFS manually. Can you provide some detailed inputs here ?

avatar
Guru

1. If you need home directories for each of the users, then you need to create home directories. Ownership can be changed from CLI or you can set using Ranger (though I think changing from CLI is better than creating a new profile in Ranger for these things)

2. I am talking about principals here, not service users (like hdfs, hive, yarn) coming from AD (using SSSD or some other such too). So, with you setup local users are create on each node. But they still need to authenticate with your KDC. Ambari can create it for you on the OU once you give the credentials to ambari.

3. Its not mandatory to have /user/<username> for each user. We have cases where BI users how use ODBC/JDBC and don't even have login access to the nodes not needing /user/<username>. Even users that login don't need /user/<username> and could use something like /data/<group>/... to read/write to hdfs.