Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Hive drop partitions using range impacts metastore

Hive drop partitions using range impacts metastore

Explorer

I'm using Hadoop 2.6.0-cdh5.14.2, Hive 1.1.0-cdh5.14.2
In this system a huge table with 183K+ partition does exist, it is an external table and the command:

 

0: jdbc:hive2://hiveserver2.hd.docomodigital.> drop table unifieddata_work.old__raw_ww_eventsjson

does not work, the metastore does not reply within 600 seconds and the task ends with error.

I attempted to delete the partitions using the range:

 

0: jdbc:hive2://hiveserver2.hd.docomodigital.> alter table unifieddata_work.old__raw_ww_eventsjson drop PARTITION (country='ae', year='2017', month='01', day>'29', hour > '00' );
INFO : Compiling command(queryId=hive_20180821140909_ba6c4bb0-d0de-4fd3-a5ec-47e217289c6b): alter table unifieddata_work.old__raw_ww_eventsjson drop PARTITION (country='ae', year='2017', month='01', day>'29', hour > '00' )
INFO : Semantic Analysis Completed
INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
INFO : Completed compiling command(queryId=hive_20180821140909_ba6c4bb0-d0de-4fd3-a5ec-47e217289c6b); Time taken: 0.612 seconds
INFO : Executing command(queryId=hive_20180821140909_ba6c4bb0-d0de-4fd3-a5ec-47e217289c6b): alter table unifieddata_work.old__raw_ww_eventsjson drop PARTITION (country='ae', year='2017', month='01', day>'29', hour > '00' )
INFO : Starting task [Stage-0:DDL] in serial mode
INFO : Dropped the partition country=ae/year=2017/month=01/day=30/hour=01
INFO : Dropped the partition country=ae/year=2017/month=01/day=30/hour=02
INFO : Dropped the partition country=ae/year=2017/month=01/day=30/hour=03
INFO : Dropped the partition country=ae/year=2017/month=01/day=30/hour=04
INFO : Dropped the partition country=ae/year=2017/month=01/day=30/hour=05
INFO : Dropped the partition country=ae/year=2017/month=01/day=30/hour=06
INFO : Dropped the partition country=ae/year=2017/month=01/day=30/hour=07
INFO : Dropped the partition country=ae/year=2017/month=01/day=30/hour=08
INFO : Dropped the partition country=ae/year=2017/month=01/day=30/hour=09
... CUTTED HERE ...

It works but something bad happens to Metastore: canary stops working. Any idea about how to solve the issue? Is there an alternative way to delete such a big table?

4 REPLIES 4

Re: Hive drop partitions using range impacts metastore

Master Collaborator

Hi,

 try to log into the Metastore database and manually remove the table from the metadata tables. 

I think the table containing tables is TBLS. You should also remove the records from child tables, such as columns and locations.

 

Then restart the metastore, and it should ok.

 

As this is an external table, you will not remove the data with this action.

 

Re: Hive drop partitions using range impacts metastore

Explorer
It’s not a solution we can adopt in a production environment where the same problem can face again.
Any other idea?

Re: Hive drop partitions using range impacts metastore

Master Collaborator

I understand. But how many times you create and drop ta table with 180k partitions? It is a matter of simple script. But maybe you are right, the metastore should handle bigger timeouts.

Highlighted

Re: Hive drop partitions using range impacts metastore

Explorer
There are HUGE tables with a lot of partitions, this case is not unique. Partitioning helps to address the slice of data that matters and each partition contains a lot of data. Oh well... it's BigData at all.