Reply
Highlighted
Contributor
Posts: 35
Registered: ‎12-13-2013
Accepted Solution

Very slow catalog update after insert overwrite

Could anyone help point me where I should look into why catalog update takes so long after an insert overwrite?  From profile I can see that data was written in about 7s but it took another 107s to update catalog.

 

Remote fragments started: 2,249,771,266
DML data written: 7,268,752,004
DML Metastore update finished: 114,757,731,166
Request finished: 114,786,664,689

...

- MetastoreUpdateTimer: 107,517,902,343

 

It uses dynamic partitioning, though I don't think that's a factor here:

 

insert overwrite table action_fact_a partition(p_action_date_ym='201511',p_campaign_id_mod=5, p_publisher_id_mod) select field1, etc....

 

There are 10 p_publisher_id_mod partitions so 10 files generated, each only a couple MB, so no more than 30MB altogether, and I don't think any more than 10 blocks were deleted and 10 inserted. Cluster is not particularly under load and performance of this operation is pretty stable.  10 DNs. Source table is text, target (partitioned) is parquet.

 

Much appreciated!

Cloudera Employee
Posts: 25
Registered: ‎11-12-2014

Re: Very slow catalog update after insert overwrite

Hi Mauricio, 

 

You're probably running into IMPALA-1480. If a table has a substantial number of partitions (>10K) it take a long time to perform certain DDL operations even though only a small fraction of metadata changes. 

 

Dimitris

Contributor
Posts: 35
Registered: ‎12-13-2013

Re: Very slow catalog update after insert overwrite

[ Edited ]

Thanks Dimitris.  I've commented in the ticket and upvoted it.  Hope you guys can get it in progress soon!