Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How is the Isolation done in Production ?

How is the Isolation done in Production ?

Can anyone Elaborate isolation use case in production please ?

7 REPLIES 7
Highlighted

Re: How is the Isolation done in Production ?

Guru

Can you elaborate your question on what you mean by 'islolation' ?

Highlighted

Re: How is the Isolation done in Production ?

Guru

Hi @DumindaJ ,

by "Isolation" do you mean multi-tenancy-like segregation of resources and data ?!?!

If yes, then take a look at

  • YARN queues for resource segregation, and
  • Ranger-HDFS-policies for authorization of HDFS folders/data

And ensure your cluster is kerberized , otherwise a segregation will be reeeeeally hard to establish ;)

If not, then please explain in more detail what you mean by "Isolation"

Regards....

Highlighted

Re: How is the Isolation done in Production ?

Thanks, here what is mean by isoloation is multi - tenancy enviroment

Highlighted

Re: How is the Isolation done in Production ?

Hi @DumindaJ,

If by isolation you mean multi-tenancy, then you have several levels:

  • Resource isolation: this is achieved through YARN Capacity Scheduler. You can define several queues and allocate resources (RAM, CPU) to these queues. Your jobs and applications will be submitted to a queue and use resource allocated for this queue. You can define queues by departments (Marketing, R&D), applications (fraud detection, etc) or type of applications (Batch, Real Time, ML), etc. There are other interesting features such as elasticity. I recommend you looking to the Yarn doc for this link.
  • Security: several things to consider for security. The first thing is authentication to make sure that users are who claim they are (Kerberos). The second thing is control access to check if a user has the right to access production data. The control access is for submitting jobs too. Does a user has right to submit a job to marketing queue ? control access for all the different tools is done with Ranger (HDFS, Kafka, Hive, HBase, etc)
  • Storage quota: you can also be interested by enforcing rules about storage with HDFS Quota

Do these answer your question ?

Highlighted

Re: How is the Isolation done in Production ?

Yes thanks

Highlighted

Re: How is the Isolation done in Production ?

@DumindaJ Great, happy it helped. Please accept the answer as it can be helpful for other users that look for the same information.

Highlighted

Re: How is the Isolation done in Production ?

Guru

HDFS data isolation can be achieved with Ranger/HDFS policies/ACLs and quotas.

If you need full resource isolation (CPU and memory), in addition to YARN queues, you will need DominantResourceCalculator and CGroups. By default, YARN provides scheduling based on memory requirements. While this works for a lot of usecases, CPU intensive workloads like Spark require DominantResourceCalculator to schedule CPU. CGroups gives you even finer level control by providing kernel level resource isolation.

More information on CPU scheduling on HDP is here . More information on CGroups is here

Don't have an account?
Coming from Hortonworks? Activate your account here