Reply
Explorer
Posts: 9
Registered: ‎05-12-2016

Concept of HDFS multi tenancy.

Hello community,

 

I'm very confused of how can I aproach multitenancy in HDFS on a shared resources cluster.

 

If I just use sentry permissions other areas or clients can notice about the existence of other users or departments in the cluster.

 

Is there any way to have isolated hdfs resources with shared resources aproach?

 

I have seen HDFS federation, but as far as I understand I would need 6 servers just for 6 departments 1 namenode per department, and I think this doesn't scale so well...

 

Thank you in advance!

Posts: 1,827
Kudos: 406
Solutions: 292
Registered: ‎07-31-2013

Re: Concept of HDFS multi tenancy.

You can prevent directory listing by removing the read-bit on the parent directories.

Could you explain further on how your multi-tenancy is structured over HDFS/Hive, so additional comments may be derived based on your design/goal?
Highlighted
Explorer
Posts: 9
Registered: ‎05-12-2016

Re: Concept of HDFS multi tenancy.

[ Edited ]

Hello Harsh,

 

First of all, thank you for you reply :)


@Harsh J wrote:
You can prevent directory listing by removing the read-bit on the parent directories.

Yes just as in Linux, but the point is that if I have for example an organization with subdivisions, and my organization doesn't want that each subdivision notice that they are in a shared cluster, at HDFS level is any way to do this?

I know that with hive there is no problem about that, when you deny permisions on a DB hive just don't list it.

But with HDFS if for example an end user is using Hue and have permissions to execute FileBrowser because it have to do puntual uploads or something, that user can navigate to root or /user folder and notice that the clusters is being shared.

 

So what I want to know is that is there any way to isolate this with something like chroot jail in linux, or at least if there is any way to change users hdfs home directory for each user.

 

Thank you in advance.

 

 

 

 

 

 

New Contributor
Posts: 1
Registered: ‎02-07-2017

Re: Concept of HDFS multi tenancy.

Hello Glize,

 

I know this is an old topic but we are close to a desperate situation where we want to solve the same problem as you had here almost two years ago.

 

Was wondering if you finally solved the problem or that is still insolvable. that would really help us :)

 

Thanks!

 

Alfredo.

New Contributor
Posts: 4
Registered: ‎11-20-2018

Re: Concept of HDFS multi tenancy.

Hi Glize,

 

There are multiple things we can configure multitenant cluster.

 

  1. Dynamic resourse pool allocation. that is Schedular.
  2. sentry (RBAC) 
  3. ACL etc...
Explorer
Posts: 15
Registered: ‎01-31-2019

Re: Concept of HDFS multi tenancy.

It can be done--I'm doing it--but it is a lot of work.

 

You will need to set up encryption zones in HDFS. Each zone is a separate folder and holds all the data for one organizational unit. The HDFS superuser will be responsible for loading the encrypted data using distcp; either that person or someone else will need to manage the encryption keys as well.

 

Sentry is a role-based service that works with Hive and other "downstream" services: it cannot be configured to secure HDFS against unauthorized access.

 

Here is the Cloudera documentation on encryption zones and multi-tenancy that may prove useful:

 

https://www.cloudera.com/documentation/enterprise/5-16-x/topics/encryption_planning.html

 

David

Announcements