Created on 01-12-2015 05:23 AM - edited 09-16-2022 02:18 AM
Hi,
PROBLEM: We update our cluster from CDH 5.1 to 5.3. After the upgrade some of our queries fail with permission denied. This happens when the user who runs the query queries a table over data owned by another user.
EXAMPLE: For example, we have a Flume agent writing logs in "/user/flume/logs" with user flume. Hive table logs is created over that directory (with CREATE EXTERNAL TABLE). When we run a read-only query (SELECT) as user bi we get the following error:
FAILED: RuntimeException Cannot create staging directory 'hdfs://nameservice/user/flume/logs/.hive-staging_hive_2015-01-12_10-43-00_285_2638530316386815724-1': Permission denied: user=bi, access=WRITE, inode="/user/flume/logs":flume:flume:drwxr-xr-x
This is because the job run with user bi tries to create directory ".hive-staging_hive_2015-01-12_10-43-00_285_2638530316386815724-1" in "/user/flume/logs" which is owned by user flume and it does not have the permission to do this. In my opinion a read-only user, such as bi, shouldn't be able to write in a production directory such as "/user/flume/*", but they should be able to read it (which includes running queries on it). This never happend in previous versions of CDH.
If I run a similar query with user bi over data owned by the same user bi, everything works fine, except that those ".hive-staging_hive*" directories are still created in table's location and they contain the whole data retrieved by the query, wasting our HDFS space. These directories should be anyway temporary and they should be deleted after a while.
SOLUTION: After doing some research it seems that Hive needs to use a staging directory, which is configured by hive.exec.stagingdir configuration property. Before the upgrade this property had no value. After the upgrade it has value ".hive-staging". We changed it to "/tmp/hive-staging" and now everything works fine. Please fix this default value, as other CDH users might encounter it. Additionally, it would be fine to have this property in Cloudera Manager web interface. To make it worked we needed to put it in safety valve.
Best regards,
Călin-Andrei Burloiu.
Created on 01-12-2015 09:16 AM - edited 01-12-2015 09:17 AM
Hello,
Thank you for the detailed report! This is a bug in the Hive HDFS Encryption integration in CDH 5.3.0. When a user has read only access, it's supposed to fail back to a directory in /tmp. I have created a DISTRO jira to track this DISTRO-681- Bug in HDFS Encryption for read only users.
We'll fix this ASAP.
Brock
Created on 01-12-2015 09:16 AM - edited 01-12-2015 09:17 AM
Hello,
Thank you for the detailed report! This is a bug in the Hive HDFS Encryption integration in CDH 5.3.0. When a user has read only access, it's supposed to fail back to a directory in /tmp. I have created a DISTRO jira to track this DISTRO-681- Bug in HDFS Encryption for read only users.
We'll fix this ASAP.
Brock
Created 01-16-2015 10:05 AM
It looks like DISTRO 682 is describing the same problem.
Simple queries fail with permissions error on scratch dir when querying read-only tables in HDFS
Should the 2 be linked and the later resolved?
Thanks,
Thad