Created 08-06-2018 01:23 PM
I have a couple of external Hive tables of which I need to gain a group of users access to only the non sensitive columns with Ranger (HDP 2.6.3). But during some tests with a testuser I found that he can only access these non sensitive columns if he has access to the path on HDFS. The HDFS path is secured by Ranger as well. I've set the HDFS permissions to no access at all.
Needing HDFS access for those users would defeat te purpose of granting access to a selection of columns. Because the HDFS file would contain all the sensitive data as well.
Is there a way to protect both the HDFS files and grant access to a selection of columns?
Example create table statement:
CREATE EXTERNAL TABLE berth_data (`MUTATION_TYPE` string, `D_MUTATION` string, `T_MUTATION` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION '/data/production/sensitive/berth_data';
Created 08-06-2018 01:43 PM
@Marcel-Jan Krijgsman have you configured hive.server2.enable.doAs=false ? If you have it set to true then all access to hdfs is going to be made as caller user. However if set to false then access to hdfs is made as hive. I believe the recommended approach is to set this value to false for ranger hive sql authorization.
HTH
*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.
Created 08-06-2018 01:43 PM
@Marcel-Jan Krijgsman have you configured hive.server2.enable.doAs=false ? If you have it set to true then all access to hdfs is going to be made as caller user. However if set to false then access to hdfs is made as hive. I believe the recommended approach is to set this value to false for ranger hive sql authorization.
HTH
*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.
Created 08-07-2018 09:12 AM
@Felix Albani Thanks for that answer. Looks like I stand for an interesting choice:
Change hive.server2.enable.doAs=true and run Hive on HDFS as HiveServer2 process. But then I can restrict access to columns to users in Hive, without them getting access to the HDFS files. So the choice of Hive permissions I make will be much more important.
Keep hive.server2.enable.doAs=false and I will not be able to do column based access in Hive. But be in the comfort that if someone gets access to Hive table without the HDFS access, they still can not get to the data.
I'll have to think about this.