Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

[CDH 5.3] Hive staging directory has wrong default value

avatar
New Contributor

Hi,

PROBLEM: We update our cluster from CDH 5.1 to 5.3. After the upgrade some of our queries fail with permission denied. This happens when the user who runs the query queries a table over data owned by another user.

EXAMPLE: For example, we have a Flume agent writing logs in "/user/flume/logs" with user flume. Hive table logs is created over that directory (with CREATE EXTERNAL TABLE). When we run a read-only query (SELECT) as user bi we get the following error:

FAILED: RuntimeException Cannot create staging directory 'hdfs://nameservice/user/flume/logs/.hive-staging_hive_2015-01-12_10-43-00_285_2638530316386815724-1': Permission denied: user=bi, access=WRITE, inode="/user/flume/logs":flume:flume:drwxr-xr-x

This is because the job run with user bi tries to create directory ".hive-staging_hive_2015-01-12_10-43-00_285_2638530316386815724-1" in "/user/flume/logs" which is owned by user flume and it does not have the permission to do this. In my opinion a read-only user, such as bi, shouldn't be able to write in a production directory such as "/user/flume/*", but they should be able to read it (which includes running queries on it). This never happend in previous versions of CDH.

If I run a similar query with user bi over data owned by the same user bi, everything works fine, except that those ".hive-staging_hive*" directories are still created in table's location and they contain the whole data retrieved by the query, wasting our HDFS space. These directories should be anyway temporary and they should be deleted after a while.

 

SOLUTION: After doing some research it seems that Hive needs to use a staging directory, which is configured by hive.exec.stagingdir configuration property. Before the upgrade this property had no value. After the upgrade it has value ".hive-staging". We changed it to "/tmp/hive-staging" and now everything works fine. Please fix this default value, as other CDH users might encounter it. Additionally, it would be fine to have this property in Cloudera Manager web interface. To make it worked we needed to put it in safety valve.

 

Best regards,
Călin-Andrei Burloiu.

1 ACCEPTED SOLUTION

avatar
Contributor

Hello,

 

Thank you for the detailed report! This is a bug in the Hive HDFS Encryption integration in CDH 5.3.0. When a user has read only access, it's supposed to fail back to a directory in /tmp. I have created a DISTRO jira to track this DISTRO-681- Bug in HDFS Encryption for read only users.

 

We'll fix this ASAP.

 

Brock

View solution in original post

2 REPLIES 2

avatar
Contributor

Hello,

 

Thank you for the detailed report! This is a bug in the Hive HDFS Encryption integration in CDH 5.3.0. When a user has read only access, it's supposed to fail back to a directory in /tmp. I have created a DISTRO jira to track this DISTRO-681- Bug in HDFS Encryption for read only users.

 

We'll fix this ASAP.

 

Brock

avatar
New Contributor

It looks like DISTRO 682 is describing the same problem.

 

Simple queries fail with permissions error on scratch dir when querying read-only tables in HDFS

 

Should the 2 be linked and the later resolved?

 

Thanks,

 

Thad