Current scenario (no Ranger):
Hive security is relying on HDFS file permissions (storage based auth). In Hive warehouse, each database/table folder is owned by the respective user who created the database/table.
with Ranger how do we manage hive security:
Hive impersonation turned on:
In this case, I am having to provide access to underlying HDFS folders to the user who is requesting to access the hive table. It means that I have to create a HDFS policy and also a Hive policy to enable access to a user on a given Hive table.
It is difficult to create hdfs policies for each and every hive db & table. (thousands of tables) Is there a recomanded approach?
So what we have done for example is made some assumptions about who will access data and how. We break this down into 2 groups of users: Analysts, and Power Users.
Analysts (90% of users) ONLY access the data via Hive, they never go from HDFS or use any other tools. Analysts also need to have column level security in place to ensure they only access data related to thier clearance - ie public, pii, spii...
Power Users (10 % of users) can access the datasets with any tool from Hive or HDFS and have no restrictions on the columns they can see. The service application also counts as a poweruser as it deals with the ingest and preping of the data.
To facilitate this we did a few things
Hope this helps
For new Ranger implementations:
> Is there a way to bring the access/permissions for all hive database/tables into ranger without manually creating them in Ranger?
> If there is no easy way to pull current access/permissions into ranger, does it require overhaul of the security architecture for hive databases/tables?