Created 08-01-2017 05:49 PM
We are moving our Oracle "landing" data into Hadoop. In Oracle we have three environments and three Oracle databases: dwdev, dwtest, and dwprod. The goal is to have three separate "landing" zones in Hadoop that will feed into each Oracle database, respectively, i.e. Hadoop dev feeds Oracle dwdev, etc.
The dev and test hadoop environment will exist on a single physical hadoop cluster.
How do we architect this?
database namespace (or schema_owner) = db_marketing
table name = customer_master
In DEV select * from db_marketing.customer_master would source from /dev/data/marketing/customer_master
In TEST select * from db_marketing.customer_master would source from /test/data/marketing/customer_master
Does this require multiple metastores?
What is best practice for multiple environments on a single Hadoop cluster?
Created 08-04-2017 01:04 PM
Hi Kimberlee ,
First point . There is a concept of resource management where one can assign resources to specific groups via "queue management" in Hadoop.
Checking queue concept around "capacity scheduler" should help you to distribute resources among your Test and Dev environment.
Creating different schema-database space , schema owner , schema group will help in applying security on HDFS level and keep both environment exclusive to each other.
Ranger policy can help you to define security for two different metastore for Dev and Test environment and keep data safe. Hope this helps.