Created 10-01-2015 06:09 PM
As per title subject.
Is there a way I can have Ranger setup against Hive, but if a policy does not exist to pass through to the HDFS permissions rather then immediately deny? We had been considering setting up two different HiveServers, one which is using the Storage auth, and another which is using Ranger but we are not sure if that's immediately possible due to the way Ambari now makes this a toggle switch.
We only have a handful of tables that really require Ranger to be used for Column authorization (and ofc this data on HDFS will be owned by Hive) all the other tables don't require column authorization and have extensive HDFS Extended ACL use so there are many many users added already to these storage policies.
The ops team does not really want to migrate these unless absolutely required as they had started writing a script to do this in dev, and we end up with 1000's of policies that just make a total mess of everything.
Thoughts are welcome.
@Hive
@Security
@Ranger
Created 10-01-2015 08:25 PM
This is tricky. It is best to migrate the policies from HDFS back to HS2. Remember, extended aCL to Ranger policies is not 1:1. We can optimize the Ranger policies using wildchard, multiple resources and groups in a single policy, so it should not be 1000 policies in Ranger at the end.
Created 10-01-2015 08:25 PM
This is tricky. It is best to migrate the policies from HDFS back to HS2. Remember, extended aCL to Ranger policies is not 1:1. We can optimize the Ranger policies using wildchard, multiple resources and groups in a single policy, so it should not be 1000 policies in Ranger at the end.
Created 10-02-2015 12:54 PM
I think we are going to try to go the route of having 2 HS2, one with Ranger Integration, one without. No matter what we do we will end up with 1000's of policies as there are potentially 1000's of different data sources being onboarded to the system. Most of these don't require anything more then HDFS Security, so having to enter them into Hive for access is a management nightmare.
By having a 'Ranger HS2 with doAs=False' and a 'Non-Ranger doAs=true' HS2 we can at least decide which tables require Ranger benefits and then force ownership of that data to Hive and make the policcies as a one off, rather then polices for every new data source that get on-boarded.
Today after a simple migration from ACL's we had 17+ pages of policies in Ranger, which we all thought was crazy when most of them don't need Ranger benefits. Just to note this is also using wildcards for resources and multiple users and groups in a policy.