Support Questions

Find answers, ask questions, and share your expertise

Very slow catalog update after insert overwrite

avatar
Rising Star

Could anyone help point me where I should look into why catalog update takes so long after an insert overwrite?  From profile I can see that data was written in about 7s but it took another 107s to update catalog.

 

Remote fragments started: 2,249,771,266
DML data written: 7,268,752,004
DML Metastore update finished: 114,757,731,166
Request finished: 114,786,664,689

...

- MetastoreUpdateTimer: 107,517,902,343

 

It uses dynamic partitioning, though I don't think that's a factor here:

 

insert overwrite table action_fact_a partition(p_action_date_ym='201511',p_campaign_id_mod=5, p_publisher_id_mod) select field1, etc....

 

There are 10 p_publisher_id_mod partitions so 10 files generated, each only a couple MB, so no more than 30MB altogether, and I don't think any more than 10 blocks were deleted and 10 inserted. Cluster is not particularly under load and performance of this operation is pretty stable.  10 DNs. Source table is text, target (partitioned) is parquet.

 

Much appreciated!

1 ACCEPTED SOLUTION

avatar
Contributor

Hi Mauricio, 

 

You're probably running into IMPALA-1480. If a table has a substantial number of partitions (>10K) it take a long time to perform certain DDL operations even though only a small fraction of metadata changes. 

 

Dimitris

View solution in original post

2 REPLIES 2

avatar
Contributor

Hi Mauricio, 

 

You're probably running into IMPALA-1480. If a table has a substantial number of partitions (>10K) it take a long time to perform certain DDL operations even though only a small fraction of metadata changes. 

 

Dimitris

avatar
Rising Star

Thanks Dimitris.  I've commented in the ticket and upvoted it.  Hope you guys can get it in progress soon!