I would like to ask for a consultation. I use a Hive/Impala/Sentry synchronization. I am getting some unexpected results.
Let's imagine a following directory tree:
Let's run this command in Impala:
GRANT ALL ON URI 'hdfs://nameservice/top/main' TO ROLE my_role;
The user who belongs to this group can create an external table with location clause from the directory "main" but not from the subdirectory "sub". I had to issue an explicit GRAN ALL ON URI for the subdirectory to help the user.
This is unexpected behaviour and I am very confused. Does anybody have ideas what the problem could be? And how to remediate it without granting URI on all subdirectories?
My version of CDM is Cloudera Enterprise 5.5.1. Impala is 2.3.0-cdh5.5.1 RELEASE and Sentry is Apache Sentry 1.5.1-cdh5.5.1.
Does the URI contain capital case letters? If yes, it could be IMPALA-2695, fixed in CDH 5.5.2.
thank you for the remark, it is definitely useful even if it does not solve the problem.
The name of the HA nameservice is capitalized, however, this bug would prevent correct execution of the grant to the directory being used as a source for the external table. My problem is that the grant on the parent directory does not provide the access to the subfolders. When any of the subfolders receives a direct grant then it works (the nameservice is repeated always as a part of the URI).
Thanks for clarifying. How are you creating the table? Can you paste the sql here?
create external table mytest (c1 STRING) location 'hdfs://NAMESERVICE/top/main/sub';
in order to be consistent with the example I gave in my first post I use the same names of dirs
HDFS-Sentry Sync works on DATABASE locations and TABLE locations. URIs are not considered when enforcing HDFS ACLs through Sentry. This is a common misunderstanding.
When a database or table is created, the HDFS-Sentry Sync plugin will automatically apply the appropriate HDFS ACLs for you based on the priviliges assigned in Sentry. There is no need to perform any manual actions.
The HDFS-Sentry Sync plugin iteractions invole the HDFS NameNode, the Sentry Server, and the Hive Metastore. Any time a database/table is created/dropped in the Hive Metastore (whether the request comes from Hive, Impala, etc), the LOCATION of the database/table is sent to Sentry, then forwarded on to the HDFS NameNode Sentry Sync plugin. The plugin then is responsible for assigning the HDFS ACLs to those locations. Again, URIs are not considered.