Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to properly delete a hive table through HUE ?

avatar
Contributor

When I have a table at my sqoop schema and want to delete it, I go to HIVE editor (through HUE)  and key in the following command DROP TABLE IF EXISTS schemaName.tblName PURGE;

After that the table disappeared form the gui of HUE (sqoop table list, metastore list) but the actual files of the table were not deleted from the HDFS.

From the other hand when I do the same thing at the impala editor everything works fine.

 

Why this is happened & how can I drop/remove a table from hive metastore and HDFS  all at once?

 

Thanks in advance.

 

 

 

8 REPLIES 8

avatar
Rising Star

hi @eMazarakis 

 as far as I know Impala does not support HDFS Impersonation . 
(security reasons I guess )
means you can't delete hdfs files using hue user

cheers !

avatar
Contributor

@BildervicI’m afraid it is not clear what I am saying. I have a table which is owned to impala user. When I am dropping the table through impala editor (Hue) with the above command the table & hdfs files were deleted.

 

But when I have a table  which is owned to hive user and I am trying to delete it through hive editor (HUE) with the above command the table is disappeared from the gui but the files were not deleted.

 

And I am asking why is this happened ? 

avatar

Hi, 

 it looks like a permission issue which is silently ignored. Can you please post the ACL's from the HDFS path, the root folder where the table is stored and the ACL's of the table path as well.

Maybe you can check the hive server's log, if there is any kind of permission issue.

And finally I would check the NameNode logs, of the file is not deleted because of missing permissions, it will be there as a log message.

Tomas

 

avatar
Contributor

@Tomas79  here is the  ACL of the hdfs path of the dir/table. 
# file: /user/hive/warehouse/sqoop.db/sr_tt
# owner: hive
# group: hive
# flags: --t
user::rwx
group::rwx
other::rwx

Now if I go from Hive editor (HUE) and execute the query DROP TABLE  sr_tt PURGE; 

the files were not deleted.

 

Also Do you know where can I found the logs of the NameNode  and  hive server's log?

 

 

avatar
Super Guru
Hi,

NN log is under /var/log/hadoop-hdfs and HS2 under /var/log/hive by default. If you use Cloudera Manager, you can check in CM's configuration page for both of them.

To determine why the directory was not removed:

- check if the table is managed or external table, as external table will not delete HDFS files
- check the HMS server log, also under /var/log/hive on the HMS host, to check if there is any permission errors, as it is HMS's responsibility to create and delete HDFS paths
- please also confirm if you have Hive Impersonation enabled? (under CM > Hive > Configuration > "HiveServer2 Enable Impersonation" if you are using CM)

Cheers
Eric

avatar

@eMazarakis I noticed the "-t" flag on the directory. That is a sticky bit, every time you see it in the HDFS ACLs, it tells you that only the owner is able to drop directory, nobody else (even if write access is granted)

 

So in your case only the hive user can remove this directory. Maybe as @EricL  pointed out, you have impersonation, so the query is running under a different user.

In either way, you need to search for this permission issue in the logs.

 

avatar
Contributor

@EricL  My findings are the following: 

 

  1. The table is managed.
  2. Also the HiveServer2 Enable Impersonation is setting to TRUE. I do not understand what this option is doing. Can you explain me ?
  3. HiveMetastoreServer log I can't see anything about  permissions.

avatar
Super Guru
Hi,

"HiveServer2 Enable Impersonation is setting to TRUE" is probably the reason. When Impersonation is true, it means Hive will impersonate as the end user who runs the query to submit jobs. Your ACL output showed that the directory is owned by "hive:hive" and as @Tomas79 found out, you have sticky bit set, so if hive needs to impersonate as the end user, the end user who runs the query will not be able to delete the path as he/she is not the owner. If impersonation is OFF, then HS2 will run query as "hive" user (the user that runs HS2 process), then you should not see such issue.

I assume you have no sentry? As sentry will require Impersonation to be OFF on HS2 side, so that all queries will be running under "hive" user.

To test the theory, try to remove the sticky bit on this path and drop again in Hive.

Cheers
Eric