Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

HUE Metastore Manager - Drop Table not deleting files in HDFS

avatar
Contributor

When dropping a table from the Metastore Manager in HUE, the underlying HDFS files are not removed, which means users can still query the table (tested with Impala). The table was created using the Metastore Manager, and the data was added by running a Spark Action in Oozie (LOAD DATA INPATH... kv1.txt... INTO TABLE...)

 

While logged in as a HUE superuser, I tried deleting the Hive folder corresponding to the table I wanted to remove, but I received a permission error:

 

Cannot perform operation. Note: you are a Hue admin but not a HDFS superuser, "hdfs" or part of HDFS supergroup, "supergroup".

AccessControlException: Permission denied by sticky bit: user=cloudera, path="/user/hive/warehouse/hivetest2":cloudera2:hive:drwxrwxrwt, parent="/user/hive/warehouse":hive:hive:drwxrwxrwt (error 500)

 

What do I need to configure so that a HUE superuser can delete from Hive via the File Browser?

What do I need to set so that dropping a table from the Metastore Manager deletes the HDFS files?

 

 

1 ACCEPTED SOLUTION

avatar
Champion

@jpayne1

 

If an user created a table and loaded data into it and another user drop the table then only table will be droped but underlined data will exists

 

Ex: UseCase 1:
1. Login as User A and create a table tab1 and load data into it
2. Drop the table tab1. Now table will be droped and files from HDFS path will be removed

UseCase 2:
1. Login as User A and create a table tab1 and load data into it
2. Logout from User A and login as User B
3. Drop the table tab1. Now table will be droped but files from HDFS path will remain exists

View solution in original post

4 REPLIES 4

avatar
Champion

@jpayne1

 

If an user created a table and loaded data into it and another user drop the table then only table will be droped but underlined data will exists

 

Ex: UseCase 1:
1. Login as User A and create a table tab1 and load data into it
2. Drop the table tab1. Now table will be droped and files from HDFS path will be removed

UseCase 2:
1. Login as User A and create a table tab1 and load data into it
2. Logout from User A and login as User B
3. Drop the table tab1. Now table will be droped but files from HDFS path will remain exists

avatar
Contributor

Thanks for the quick reply. I have another case.

 

Use Case 3:

1. Login as User A and create a table tab1 and load data into it
2. Logout from User A and login as User B

3. As User B, load data into table tab1

 

Now if User A drops the table, will it also delete the file User B loaded?

 

UPDATE: Just tested this and can confirm User B's loaded files will be deleted as well if User A drops the table.

avatar
Champion

Was the table an internal (managed) or external (unmanaged) table?  The former will delete the metadata and the underlying data in HDFS.  The latter will not.

 

As for removing the data now, you need to be a HDFS superuser.  You logged into HUE as cloudera which is not.  Easiest way is through the command line, switch to the hdfs user, and then run the command.  This requires shell access and sudo access to hdfs, which you may not have.  In leui of that you could create an hdfs user in user (assuming no auth) and then log into it.  This is risky though as then the user exist within the HUE db and anybody that can get access to it will have root level access to HDFS.  If you can do either of these, update the HDFS configs to include the cloudera account as a HDFS superuser (https://www.cloudera.com/documentation/enterprise/5-8-x/topics/cm_sg_s5_hdfs_principal.html).

avatar
Contributor

It is an internal table. The creation process was using the HUE GUI to 'Create a new table manually' in the Metastore Manager for the Hive default database. I didn't choose the 'Create a new table from a file' option, which allows a user to specify if it should be an external table. 

 

I updated my reply to saranvisa's use cases, and the underlying HDFS files were deleted only if the HUE user who dropped the table was its creator.

 

Fortunately, I do have access to HDFS superuser via the command line and was able to delete the table from my prior incident. Thanks for providing an alternative in the event that is not the case, especially since when deployed most users won't have command line access let alone HDFS superuser. Sounds like the trade-off is ease of use vs. level of security.