I'm very confused of how can I aproach multitenancy in HDFS on a shared resources cluster.
If I just use sentry permissions other areas or clients can notice about the existence of other users or departments in the cluster.
Is there any way to have isolated hdfs resources with shared resources aproach?
I have seen HDFS federation, but as far as I understand I would need 6 servers just for 6 departments 1 namenode per department, and I think this doesn't scale so well...
Thank you in advance!
First of all, thank you for you reply 🙂
Yes just as in Linux, but the point is that if I have for example an organization with subdivisions, and my organization doesn't want that each subdivision notice that they are in a shared cluster, at HDFS level is any way to do this?
@Harsh J wrote:
You can prevent directory listing by removing the read-bit on the parent directories.
I know that with hive there is no problem about that, when you deny permisions on a DB hive just don't list it.
But with HDFS if for example an end user is using Hue and have permissions to execute FileBrowser because it have to do puntual uploads or something, that user can navigate to root or /user folder and notice that the clusters is being shared.
So what I want to know is that is there any way to isolate this with something like chroot jail in linux, or at least if there is any way to change users hdfs home directory for each user.
Thank you in advance.
I know this is an old topic but we are close to a desperate situation where we want to solve the same problem as you had here almost two years ago.
Was wondering if you finally solved the problem or that is still insolvable. that would really help us 🙂
There are multiple things we can configure multitenant cluster.
It can be done--I'm doing it--but it is a lot of work.
You will need to set up encryption zones in HDFS. Each zone is a separate folder and holds all the data for one organizational unit. The HDFS superuser will be responsible for loading the encrypted data using distcp; either that person or someone else will need to manage the encryption keys as well.
Sentry is a role-based service that works with Hive and other "downstream" services: it cannot be configured to secure HDFS against unauthorized access.
Here is the Cloudera documentation on encryption zones and multi-tenancy that may prove useful: