Created on 04-24-2017 07:22 AM - edited 09-16-2022 04:30 AM
When dropping a table from the Metastore Manager in HUE, the underlying HDFS files are not removed, which means users can still query the table (tested with Impala). The table was created using the Metastore Manager, and the data was added by running a Spark Action in Oozie (LOAD DATA INPATH... kv1.txt... INTO TABLE...)
While logged in as a HUE superuser, I tried deleting the Hive folder corresponding to the table I wanted to remove, but I received a permission error:
Cannot perform operation. Note: you are a Hue admin but not a HDFS superuser, "hdfs" or part of HDFS supergroup, "supergroup".
AccessControlException: Permission denied by sticky bit: user=cloudera, path="/user/hive/warehouse/hivetest2":cloudera2:hive:drwxrwxrwt, parent="/user/hive/warehouse":hive:hive:drwxrwxrwt (error 500)
What do I need to configure so that a HUE superuser can delete from Hive via the File Browser?
What do I need to set so that dropping a table from the Metastore Manager deletes the HDFS files?
Created 04-24-2017 08:45 AM
If an user created a table and loaded data into it and another user drop the table then only table will be droped but underlined data will exists
Ex: UseCase 1:
1. Login as User A and create a table tab1 and load data into it
2. Drop the table tab1. Now table will be droped and files from HDFS path will be removed
UseCase 2:
1. Login as User A and create a table tab1 and load data into it
2. Logout from User A and login as User B
3. Drop the table tab1. Now table will be droped but files from HDFS path will remain exists
Created 04-24-2017 08:45 AM
If an user created a table and loaded data into it and another user drop the table then only table will be droped but underlined data will exists
Ex: UseCase 1:
1. Login as User A and create a table tab1 and load data into it
2. Drop the table tab1. Now table will be droped and files from HDFS path will be removed
UseCase 2:
1. Login as User A and create a table tab1 and load data into it
2. Logout from User A and login as User B
3. Drop the table tab1. Now table will be droped but files from HDFS path will remain exists
Created on 04-25-2017 08:01 AM - edited 04-25-2017 08:15 AM
Thanks for the quick reply. I have another case.
Use Case 3:
1. Login as User A and create a table tab1 and load data into it
2. Logout from User A and login as User B
3. As User B, load data into table tab1
Now if User A drops the table, will it also delete the file User B loaded?
UPDATE: Just tested this and can confirm User B's loaded files will be deleted as well if User A drops the table.
Created 04-25-2017 08:02 AM
Was the table an internal (managed) or external (unmanaged) table? The former will delete the metadata and the underlying data in HDFS. The latter will not.
As for removing the data now, you need to be a HDFS superuser. You logged into HUE as cloudera which is not. Easiest way is through the command line, switch to the hdfs user, and then run the command. This requires shell access and sudo access to hdfs, which you may not have. In leui of that you could create an hdfs user in user (assuming no auth) and then log into it. This is risky though as then the user exist within the HUE db and anybody that can get access to it will have root level access to HDFS. If you can do either of these, update the HDFS configs to include the cloudera account as a HDFS superuser (https://www.cloudera.com/documentation/enterprise/5-8-x/topics/cm_sg_s5_hdfs_principal.html).
Created 04-25-2017 08:36 AM
It is an internal table. The creation process was using the HUE GUI to 'Create a new table manually' in the Metastore Manager for the Hive default database. I didn't choose the 'Create a new table from a file' option, which allows a user to specify if it should be an external table.
I updated my reply to saranvisa's use cases, and the underlying HDFS files were deleted only if the HUE user who dropped the table was its creator.
Fortunately, I do have access to HDFS superuser via the command line and was able to delete the table from my prior incident. Thanks for providing an alternative in the event that is not the case, especially since when deployed most users won't have command line access let alone HDFS superuser. Sounds like the trade-off is ease of use vs. level of security.