Dear fellow CDH users, we have set up Sentry on our CDH 5.2.1 HA cluster. We can successfully create tables through Hive, but the LOAD DATA command keeps failing despite having issued a variety of grants:
LOAD DATA INPATH '/tmp/mytable.dlv' OVERWRITE INTO TABLE `MYTABLE`
ERROR: java.sql.SQLException: Error while compiling statement: FAILED:
SemanticException No valid privileges
Required privileges for this
We tried to grant the following privileges:
grant all on uri 'hdfs://nameservice1:8020/tmp' to role users; - and -
grant all on uri 'hdfs://nameservice1/tmp' to role users;
But the command still fails. We had to issue a "grant all on server server1" command for the LOAD to succeed.
Any insight of what kind of privilege is require for the LOAD to run without errors?
Sentry will not work on HDFS file permissions without the hdfs/sentry plugin enabled which is only available in 5.3 and higher. So to allow access to your folders you will have to use facl's on hdfs to allow access. Please note that facls on hdfs are name specific and do not adhere to groups from AD or other KDC providors, so you will have to use the name of the user on the files.
For doing updates you also need to add the user to the default acl or they will not be added to new files as they are created.
Also, yet another issue when you land the files they are owned by the user that lands them. This causes some issues with hive and impala as they do not have access to the files unless they are the owner. We fixed this by running a chmod/chown at the end of each of our landing jobs changing the owner to hive:hive and setting the permissions to 770.
My best recommendation would be to update to 5.4 and use the hdfs/sentry sync so all your permissions are managed though sentry instead of hdfs acl's and sentry.
Hope this helps!