Created on 03-02-2023 10:18 PM - edited 03-02-2023 10:20 PM
Hi All,
We are facing an issue where no matter what we try Impala queries will randomly throw a "Failed to open HDFS file" error. This seemingly started out of nowhere and we are not sure what else to try.
Below are some of the things we have tried.
1. Enforce SYNC_DDL
2. We used to have 87 impala daemons (both executor and coordinator). We setup dedicated coordinators for Impala (4 coordinator + 83 executors) and load balanced with haproxy.
3. Tried adding invalidate metadata, and then removing it.
Below is the sequence of queries.
1. Insert Overwrite a table. (approx every 1 hour)
2. Refresh
3. Compute stats.
4. Select.
The select never fails on the same coordinator as insert, but randomly on other coordinators. And it keeps failing until a refresh. As soon as a refresh is run on the other failing coordinator, query succeeds.
This leads me to believe it is a metadata sync issue across coordinators. The problem is that multiple applications/dashboards are using Impala and we cannot ask them to do a refresh every time.
impalad version 3.2.0-cdh6.3.3
Any help is appreciated.
Regards
SohamR
Created 03-04-2023 09:55 AM
Hi All,
Just something I have noticed. Whenever we SYNC_DDL and try to run a refresh, sometimes the query does not even register and produce a queryID, and at the same time I see below errors in the coordinator logs.:
I0304 18:53:17.278281 174197 thrift-util.cc:124] TAcceptQueueServer: Caught TException: SSL_read: Connection reset by peer
Does this point to any actual Network/SSL error? Any insights would be helpful.
Regards
SohamR
Created 03-08-2023 10:12 PM
Hi All,
Here is an example of an even worse scenario.
Can anyone please help with any ideas?
Regards
SohamR