Can anyone Elaborate isolation use case in production please ?
Hi @DumindaJ ,
by "Isolation" do you mean multi-tenancy-like segregation of resources and data ?!?!
If yes, then take a look at
And ensure your cluster is kerberized , otherwise a segregation will be reeeeeally hard to establish ;)
If not, then please explain in more detail what you mean by "Isolation"
If by isolation you mean multi-tenancy, then you have several levels:
HDFS data isolation can be achieved with Ranger/HDFS policies/ACLs and quotas.
If you need full resource isolation (CPU and memory), in addition to YARN queues, you will need DominantResourceCalculator and CGroups. By default, YARN provides scheduling based on memory requirements. While this works for a lot of usecases, CPU intensive workloads like Spark require DominantResourceCalculator to schedule CPU. CGroups gives you even finer level control by providing kernel level resource isolation.