We are facing an issue where no matter what we try Impala queries will randomly throw a "Failed to open HDFS file" error. This seemingly started out of nowhere and we are not sure what else to try.
Below are some of the things we have tried.
1. Enforce SYNC_DDL
2. We used to have 87 impala daemons (both executor and coordinator). We setup dedicated coordinators for Impala (4 coordinator + 83 executors) and load balanced with haproxy.
3. Tried adding invalidate metadata, and then removing it.
Below is the sequence of queries.
1. Insert Overwrite a table. (approx every 1 hour)
3. Compute stats.
The select never fails on the same coordinator as insert, but randomly on other coordinators. And it keeps failing until a refresh. As soon as a refresh is run on the other failing coordinator, query succeeds.
This leads me to believe it is a metadata sync issue across coordinators. The problem is that multiple applications/dashboards are using Impala and we cannot ask them to do a refresh every time.
impalad version 3.2.0-cdh6.3.3
Any help is appreciated.