12-01-2015 04:08 PM
Could anyone help point me where I should look into why catalog update takes so long after an insert overwrite? From profile I can see that data was written in about 7s but it took another 107s to update catalog.
Remote fragments started: 2,249,771,266
DML data written: 7,268,752,004
DML Metastore update finished: 114,757,731,166
Request finished: 114,786,664,689
- MetastoreUpdateTimer: 107,517,902,343
It uses dynamic partitioning, though I don't think that's a factor here:
insert overwrite table action_fact_a partition(p_action_date_ym='201511',p_campaign_id_mod=5, p_publisher_id_mod) select field1, etc....
There are 10 p_publisher_id_mod partitions so 10 files generated, each only a couple MB, so no more than 30MB altogether, and I don't think any more than 10 blocks were deleted and 10 inserted. Cluster is not particularly under load and performance of this operation is pretty stable. 10 DNs. Source table is text, target (partitioned) is parquet.
12-02-2015 05:42 PM
You're probably running into IMPALA-1480. If a table has a substantial number of partitions (>10K) it take a long time to perform certain DDL operations even though only a small fraction of metadata changes.