Created 10-10-2016 11:29 AM
Hi,
Kindly advise on the below:
We are planning to build a cluster which will serve three regions (US,UK,APAC) users, so what are all the possible approach we can follow to:
1. All region should have enough resources to perform their task.
2. storage space of all the regions should be secure and separate for each other.
3. Computation resources should be enough for all the regions.
Thanks,
Created 10-10-2016 02:51 PM
Well that should be easy. You use Apache Ranger to create different organization groups and set authorization permissions. At HDFS level, you can create directories like /region/US, /region/UK, /region/APAC and then respective subdirectories to separate data. Each of these directories and their subdirectories can have further granular level permissions using Ranger and configure the cluster with Atlas for auditing and lineage information. You can also use HDFS storage quotas if you want but it appears that to start with, you don't need that.
As for resource distribution, use YARN.
Created 10-10-2016 01:43 PM
Are you saying you will have just one cluster to serve all these regions? Your question has almost no details. Can you please share your requirements. Please remember that one cluster will not expand to more than one data center. If you will have one cluster for all regions, then you still just size based on your volume and SLAs and set the right expectations for users. for example, if your only cluster is in US then users in UK and APAC should expect slower response times due to network latency. I don't think it affects cluster size. Please provide more details, so we can help you answer.
Created 10-10-2016 02:45 PM
Hi Mqureshi,
Yes, we have only one cluster which we are planning to share among all three regions. We aware of the network latency impact on the cluster. Just want to know that how can we do the logical storage and capacity separations so that no regions user can come across any performance, security, storage issues.
Please let me know if you need further detail.
Thanks in advance.
Created 10-10-2016 02:51 PM
Well that should be easy. You use Apache Ranger to create different organization groups and set authorization permissions. At HDFS level, you can create directories like /region/US, /region/UK, /region/APAC and then respective subdirectories to separate data. Each of these directories and their subdirectories can have further granular level permissions using Ranger and configure the cluster with Atlas for auditing and lineage information. You can also use HDFS storage quotas if you want but it appears that to start with, you don't need that.
As for resource distribution, use YARN.
Created 03-27-2017 05:29 AM
Hi @mqureshi Is such case (mentioned by vikram) is it best to have one cluster which serves multiple region or should we consider having multi cluster in which each cluster servers for a single region by that way we can restrict security, performance and access. base on the region.