Support Questions

Find answers, ask questions, and share your expertise

Drop table not working as expected in Hive

avatar
Contributor

Hello guys!

 

 I hope your day is going well.

 

 Something really weird (at least for me) is happening when I drop a table in Hive.

 

1.- A table called "Test" is created by a sqoop import job using "--create-hcatalog-table". See the full job below.

2.-Once the import job completes (with a select * from the source), I drop the table without issues:
drop table test purge;
At this point I expect that all the information is deleted, gone, bye, finito, adios!
3.-I run the same sqoop job again, but it not only loads the table with the fresh information from the source. It also shows me the previous information before the drop table!
 
Some guys have told me that's because the table has been created as an external table by the sqoop job. But even that it was an external table, a drop table <table> purge; is used to get rid of the data, isn't it?
 
Pls let me know your thoughts.
 
Sqoop job:
sqoop import \
-Dhadoop.security.credential.provider.path=jceks://hdfs/user/lrm0613/mydb2.jceks \
--connection-manager org.apache.sqoop.manager.SQLServerManager \
--driver net.sourceforge.jtds.jdbc.Driver \
--connect 'jdbc:jtds:sqlserver://SQLQP002:1433;useNTLMv2=true;domain=JNL_NT;databaseName=TC31Reporting' \
--username 'lrm0613' \
--password-alias sqlserver2.password \
--query 'select * from JobHistory where $CONDITIONS' \
--hcatalog-database dataengsandbox \
--hcatalog-table Test \
--compress \
--compression-codec snappy \
--map-column-hive jobid=INT \
-m 1 \
--create-hcatalog-table \
--hcatalog-storage-stanza 'stored as parquet'
1 ACCEPTED SOLUTION

avatar
Master Collaborator

HI @Heri,

 

Here I just wanna add some points. You can use PURGE option to delete data file as well along with partition metadata but it works only in Internal/ Managed tables 

ALTER TABLE table_name DROP [IF EXISTS] PARTITION partition_spec PURGE;

 

But for External tables have a two-step process to alter table drop partition + removing file

ALTER TABLE table_name DROP [IF EXISTS] PARTITION partition_spec;

hdfs dfs -rm -r <partition file path>

 

I hope this gives some insights here. cc @aakulov 

View solution in original post

11 REPLIES 11

avatar
Master Collaborator

Hi Heri,

 

After you execute 

drop table test purge;

 can you check that the data is actually deleted? Do a query on the table first, but also check with hdfs dfs to see if the underlying files have been deleted from Hadoop (they should be). Let me know what you see. 

 

You may be right that EXTERNAL table data does not get deleted, just the metadata is deleted. That's why I'm asking you to check for data with hdfs dfs. Now, to be able to drop the EXTERNAL table (both metadata and data) you'd need to follow the steps here: https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.4/using-hiveql/content/hive_drop_external_table_...

 

Hope that helps.

 

avatar
Contributor

Thank you for responding!

 

 We've verified and the underlying files are not being deleted from Hadoop.

 

Heri_0-1590499956207.jpeg

 

avatar
Master Collaborator

Ok, so to be able to purge the table created by sqoop, because it is external, you'll need to add to your stanza:

--hcatalog-storage-stanza 'stored as parquet TBLPROPERTIES("external.table.purge"="true")' 

Then when you load the data for the first time it will enable purging on that table. Executing purge command you have will then remove both metadata and the dat in the external table. 

 

Let me know if that works and if the solution is acceptable.

avatar
Contributor

Thanks for your answer! This is a good learning experiencing for me.

I didn't know we could add the property of a table when creating the table.

 

Unfortunately, the table data still persists after the drop table command.

 

table name: test

When we create the table test, it is stored in:  /data/<database>/test

When we use a different database where the issue is not present, the table is stored in /data/<database>/hive/test

 

Could it be a database configuration issue?

avatar
Master Collaborator

The issue is that the DROP TABLE statement doesn't seem to remove the data from HDFS. This is usually caused by the table being an external table that doesn't allow Hive to perform all operations on it. 

 

Another thing you can try is what's suggested in this thread (i.e. before you drop the table, change its property to be EXTERNAL=FALSE). Does that work for you?

avatar
Contributor

Thanks,

 

I gave it a try, but it still keeps the data before the table was dropped.

select count(*) from dataengsandbox.test;
ALTER TABLE test SET TBLPROPERTIES('EXTERNAL'='False');
drop table test purge;
select count(*) from dataengsandbox.test; -> Gave me the double of the records.

describe formatted test; 

 

46Location:           hdfs://nameservice1/data/DataEngSandbox/testNULL
47Table Type:         MANAGED_TABLE       NULL
48Table Parameters:NULLNULL
49 COLUMN_STATS_ACCURATEfalse               
50 external.table.purgetrue                
51 numFiles            1                   
52 numRows             -1                  
53 rawDataSize         -1                  
54 totalSize           19098747            
55 transient_lastDdlTime1590587689          
56 NULLNULL
57# Storage InformationNULLNULL
58SerDe Library:      org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDeNULL
59InputFormat:        org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormatNULL
60OutputFormat:       org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormatNULL
61Compressed:         No                  NULL
62Num Buckets:        -1                  NULL
63Bucket Columns:     []                  NULL
64Sort Columns:       []                  NULL
65Storage Desc Params:NULLNULL

 

avatar
Master Collaborator

Hmmm... I'm stumped. Perhaps @Shu_ashu  can help?

avatar
Master Collaborator

HI @Heri,

 

Here I just wanna add some points. You can use PURGE option to delete data file as well along with partition metadata but it works only in Internal/ Managed tables 

ALTER TABLE table_name DROP [IF EXISTS] PARTITION partition_spec PURGE;

 

But for External tables have a two-step process to alter table drop partition + removing file

ALTER TABLE table_name DROP [IF EXISTS] PARTITION partition_spec;

hdfs dfs -rm -r <partition file path>

 

I hope this gives some insights here. cc @aakulov 

avatar
Contributor

Thanks for your answer!

 

That solution might work, but we've also ran a drop table in a different database, same job, and it didn't required to delete the fs. At this point we suspect that something is wrong in the database configuration.

 

 We've requested our DBA team to re-create the database because they didn't follow the same procedure as the database that works properly.

 

 Thank you very much for the help. It's really appreciated.