Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

CDH 5.4 and HDFS Sentry Plugin Sync

CDH 5.4 and HDFS Sentry Plugin Sync

Contributor

I have CDH 5.4 installed with Sentry Enabled.  I have also enabled HDFS Sentry Plugin Sync with Extended ACLs.  I can access authorised tables via Hive and Impala but cannot look at the data via HDFS (hadoop fs), I always get a permissions error.  I have checked the permissions on those file/folder/table and can see that the group that the user belongs to does on appear on the list groups permissioned in for that file/folder/table.  I have checked this via hadoop fs, hdfs dfs getfacls and hue.  I also note that one of the groups that has been created in the ACLs is blank i.e. has no group name associated with it.  How can I check that HDFS Sentry Sync is functioning correctly?  FYI I have Kerberos enabled and am using Unix groups to support this test environment.

 

Thanks

Shailesh

2 REPLIES 2

Re: CDH 5.4 and HDFS Sentry Plugin Sync

New Contributor

Hey Shailesh,

I spent about 2 months working with support on this and found that there is a bit of extra configuration in addition to the hdfs prefixes and enable button you have to apply to get the sync working. See below for the additions.

 

Added the prefix information to the below safety valves in Hive:

Hive Service Advanced Configuration Snippet (Safety Valve) for hive-site.xml

Hive Service Advanced Configuration Snippet (Safety Valve) for sentry-site.xml

 

<property>

<name>sentry.hdfs.integration.path.prefixes</name>

<value>/user/hive/warehouse</value> (add other paths as needed with a , to separate)

</property>

<property>
<name>sentry.hdfs.integration.path.prefixes</name>
<value>/user/hive/warehouse</value> (add other paths as needed with a , to separate)
</property>

 

Also we had issues browsing to nested folders, for example you have "/user/hive/warehouse/test.db". The "test.db" is your sentry object and where the sentry/hdfs permissions will be applied. However if your users does not have read/execute access though the folders /user, /user/hive, and /user/hive/warehouse they will not be able to browse to the files in the sentry object. 

 

To fix this we ran chmod 775 on the root level directories to give users access, so it is very important that you do not have any sensitive data in line. Please note we did not do this recursively it is only applied at the folder level. This will allow anyone to brows the folders and read the data all the way up to your sentry object which is your DB, at that point they have to be sentry authorized to access the data. 

 

commands Run:

 

hadoop fs -chmod 775 /user

hadoop fs -chmod 775 /user/hive

hadoop fs -chmod 775/user/hive/warehouse

 

Hope this helps!

 

P.S

We had a issue that drove me nuts and took weeks to figure out, our permissions on the root level directories kept overwriting to 770 when we restarted the cluster. The reason was we had a database located in one of our root level directories, so when sentry/hdfs sync runs it chmods anything from a database level recusivly to 770. For an example we have a database located at "/user/database1.db" and a database at "/user/hive/warehouse/test.db".  Users could browse the hdfs files on the "database1.db" but could not browse to the "test.db" as the folders changed back to 770. Frustrating as there is no real documentation around this....

Re: CDH 5.4 and HDFS Sentry Plugin Sync

Hi Rusty,

The issue with sentry.hdfs.integration.path.prefixes will be fixed in the next maintenance release for CM 5.4. What you added here looks right.

I think /user and /user/hive usually have read and execute for everybody, at least that's how CM will set it up for you by default. If this had been changed since, then it makes sense you had to change it back.

If you have HDFS sentry sync on, then only authorized users can see the files (that's the whole point). So users should only see test.db if they have appropriate permissions in Sentry to do so. The permissions should definitely be hive:hive 770, with ACLs as necessary for other users to access that table.

Thanks,
Darren