01-27-2014 06:24 PM
We are using Impala 1.1 with CDH 4.4, and we have a Impala/HIVE table that partitioned on year/month/day/hour. I problem we observed was that, when running impala query to "insert overwrite ..." into various partitions, sometime it will break the simultaneous query from other terminals.
Although not always, it make me suspect that Impala's insert overwrite implementation does not atomically refresh the metadata, hence there maybe a moment that old data files has been moved/removed but the metadata from impala nodes are stilling pointing to it.
Anyone has similar experience or workaround?
E.g., the error would say:
ERROR: Failed to open HDFS file hdfs://ip-10-224-183-156.us-west-2.compute.interna
Error(255): Unknown error 255
Cancelling query ...
ERROR: Invalid or unknown query handle
Could not execute command: select account_id,
group by account_id,
09-11-2014 01:59 AM
I'm actually having exactly the same proble, during the insert overwrite if I run a select on the partition i'm overriding the query i get "Failed to open HDFS file" any update about that ?
05-17-2017 06:31 PM - edited 05-17-2017 06:33 PM
We are also facing a similar issue. We insert into a impala table from a lot of other small tables every 5 minutes. So, the main table has a lot of small files and it is effecting the impala performance. So, we are running a insert overwrite into the table by doing a select on the same table every 6 hours. If there are any queries already running during that insert overwrite statement then they all fail ,with file not found error.
Is there any update or workaround for this issue.
Impala version : v2.7.0-cdh5.9.1
05-18-2017 03:02 AM
Impala does not support transactions so altering data while reading it will lead to these conflicts. For a workaround I recommend having a look at this blog post: https://blog.cloudera.com/blog/2015/11/how-to-inge