As mentioned here Replicating Data to Impala Clusters, after the HDFS/Hive replication we need to execute
to acomplish the Impala replication.
But this statement impose the re-uploading of hive metastore and that can cause a big latency issue for the first partitioned tables query. Is there a solution to this.
Thanks in advance.
> we need to execute
Oh - I thought that was done for you if you select the " Invalidate Impala Metadata on Destination" option.
Are you invalidating all metadata across the board or just the tables that you know you update?
In any case *something* has to reload/recalculate that metadata. Ifyou don't want it to be the first real user then I would try running some sort of query on the table yourself. That way you get the hit of the initial latency issue.
I hope that helps. I am currently learning about how to use BDR so I am no expert.
Thanks @alexmc6 for your reply,
The note in the documentation is clear:
"you must run the Impala INVALIDATE METADATA statement on the destination cluster to prevent queries from failing".
I have some tables with a big partition number, and I use it in real-time cases, so there is no time to execute the first query and wait a minutes ..