Member since
08-11-2020
6
Posts
0
Kudos Received
0
Solutions
10-23-2020
07:52 AM
Hi @tusharkathpal ! Thanks for the detailed explanation, really appreciate it! In my case, all tables were created beforehand, so all their static metadata should be already cached. However, there are commands emitted from clients to create partitions in Impala tables from time to time (every hour) and also refresh commands are periodically issued on those new partitions (every minute) to make parquet files inside them available to be queried in Impala. I can confirm that only a handful of tables were being ingested to during the HDFS switchovers. Probably the partition creation on "impala_table" or a refresh command on one of its partitions triggered a fetch of metadata from catalog server, which would explain why it happened only for "impala_table". About hive metatool command, it is listing the correct HDFS locations. I don't think it applies in my case, because HDFS is already deployed with the final nameservice in the config before hadoop starts up (i.e., there is no upgrade from non-HA to HA setup involved). About automatic invalidation of metadata, I will consider it for future Impala upgrades. It would help by handling the metadata change on "alter table add partition" command. However, I would need to change part of the ingestion pipeline due to this use case of adding files directly on the filesystem not supported.
... View more
09-28-2020
07:58 AM
Hi @PauloRC @Tim Armstrong , This might be a performance regression, but also in general a performance inefficiency with a specific planner data structure. A correctness fix for IMPALA-8386 may have introduced this perf regression in 3.2.1, IMPALA-9358 may resolve this issue, but I don't think it's available in any CDH 6.3 release yet. @PauloRC one thing to try which might mitigate the issue is to run your view query with SET ENABLE_EXPR_REWRITES=false to see if that helps.
... View more