- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Architecture Design for different regions client
Created ‎10-10-2016 11:29 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Kindly advise on the below:
We are planning to build a cluster which will serve three regions (US,UK,APAC) users, so what are all the possible approach we can follow to:
1. All region should have enough resources to perform their task.
2. storage space of all the regions should be secure and separate for each other.
3. Computation resources should be enough for all the regions.
Thanks,
Created ‎10-10-2016 02:51 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Well that should be easy. You use Apache Ranger to create different organization groups and set authorization permissions. At HDFS level, you can create directories like /region/US, /region/UK, /region/APAC and then respective subdirectories to separate data. Each of these directories and their subdirectories can have further granular level permissions using Ranger and configure the cluster with Atlas for auditing and lineage information. You can also use HDFS storage quotas if you want but it appears that to start with, you don't need that.
As for resource distribution, use YARN.
Created ‎10-10-2016 01:43 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Are you saying you will have just one cluster to serve all these regions? Your question has almost no details. Can you please share your requirements. Please remember that one cluster will not expand to more than one data center. If you will have one cluster for all regions, then you still just size based on your volume and SLAs and set the right expectations for users. for example, if your only cluster is in US then users in UK and APAC should expect slower response times due to network latency. I don't think it affects cluster size. Please provide more details, so we can help you answer.
Created ‎10-10-2016 02:45 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Mqureshi,
Yes, we have only one cluster which we are planning to share among all three regions. We aware of the network latency impact on the cluster. Just want to know that how can we do the logical storage and capacity separations so that no regions user can come across any performance, security, storage issues.
Please let me know if you need further detail.
Thanks in advance.
Created ‎10-10-2016 02:51 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Well that should be easy. You use Apache Ranger to create different organization groups and set authorization permissions. At HDFS level, you can create directories like /region/US, /region/UK, /region/APAC and then respective subdirectories to separate data. Each of these directories and their subdirectories can have further granular level permissions using Ranger and configure the cluster with Atlas for auditing and lineage information. You can also use HDFS storage quotas if you want but it appears that to start with, you don't need that.
As for resource distribution, use YARN.
Created ‎03-27-2017 05:29 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @mqureshi Is such case (mentioned by vikram) is it best to have one cluster which serves multiple region or should we consider having multi cluster in which each cluster servers for a single region by that way we can restrict security, performance and access. base on the region.
