I like to know if Federated Namespaces will help us for the following situations. Sandbox for STG Hbase (as well as HDFS) Sandbox for deploying new version of our application into one (STG) namespace, but other (PROD) namespace still has old version of our application without interfering each other Limiting usage of resources while maximizing available resources. Our current Hadoop environment ------------------------------------- We have the very small QA cluster (on VM) that is used for testing configuration changes and CDH upgrade, and small functional tests before moving new codes to prod CDH cluster. (No performance measurement will be done at QA cluster because QA cluster can’t access data on production system.) We have the small Prod cluster with 15 data nodes. This prod cluster will be used for actual functional/performance/ testings as well as the production run. For protecting two different types (prod and testing) of runs, we programmatically manage Namespaces using naming convention - a prefix (STG_ or PROD_) for HBase table / HDFS directory names by ourselves instead of using HDFS Federation providing the management of HDFS Namespaces. Question 1: I know we can use HDFS federation for separating namespaces for HDFS, but I am not sure if that provides namespaces for Hbase as well. In other words, when we run Spark jobs (or Sqoop) on STG namenode of federation, we like to restrict access of the Hbase and other system only to STG namespace. We don’t want to manage names of Hbase tables using a prefix (STG or PROD ) because we may accidentally access other namespace and ruin the data. In this ways, we can safely use same HBase table name for both STG and PROD jobs without corriding HBase on other environment. Question 2: Also, I wonder if we can have different version of our applications on each namespace. For example, can we deploy new version of application to STG namespace, but not PROD namespaces yet, so that we can test our new changes on STG namespace before pushing to PROD namespace. Question 3: I know YARN provide Fair Scheduler ( CDH 5 set the default to Fair Scheduler). When we have a federated namespace, can we set Fair Scheduler (or something similar) between STG and PROD namespaces. For example, when no job is running on PROD namespace, we like to use full resources (memory and CPU) for STG jobs. But when any PROD job starts, STG jobs (in the middle of running) must release resources to PROD jobs upto the preconfigured quota.
... View more