Reply
Expert Contributor
Posts: 123
Registered: ‎07-17-2017

BDR - problem of using invalidate metadata in Impala replication

Hi,

As mentioned here Replicating Data to Impala Clusters, after the HDFS/Hive replication we need to execute

INVALIDATE METADATA;

to acomplish the Impala replication.
But this statement impose the re-uploading of hive metastore and that can cause a big latency issue for the first partitioned tables query. Is there a solution to this.

Thanks in advance.

Explorer
Posts: 23
Registered: ‎02-19-2018

Re: BDR - problem of using invalidate metadata in Impala replication

>  we need to execute

INVALIDATE METADATA;

 Oh - I thought that was done for you if you select the " Invalidate Impala Metadata on Destination" option. 

Are you invalidating all metadata across the board or just the tables that you know you update?

 

In any case *something* has to reload/recalculate that metadata. Ifyou don't want it to be the first real user then I would try running some sort of query on the table yourself. That way you get the hit of the initial latency issue.

 

I hope that helps. I am currently learning about how to use BDR so I am no expert.

Expert Contributor
Posts: 123
Registered: ‎07-17-2017

Re: BDR - problem of using invalidate metadata in Impala replication

Thanks @alexmc6 for your reply,

The note in the documentation is clear:

"you must run the Impala INVALIDATE METADATA statement on the destination cluster to prevent queries from failing".

I have some tables with a big partition number, and I use it in real-time cases, so there is no time to execute the first query and wait a minutes ..

Announcements