28850
DISCUSSIONS
102505
MEMBERS
3166
ARTICLES
Created 01-27-2014 06:24 PM
We are using Impala 1.1 with CDH 4.4, and we have a Impala/HIVE table that partitioned on year/month/day/hour. I problem we observed was that, when running impala query to "insert overwrite ..." into various partitions, sometime it will break the simultaneous query from other terminals.
Although not always, it make me suspect that Impala's insert overwrite implementation does not atomically refresh the metadata, hence there maybe a moment that old data files has been moved/removed but the metadata from impala nodes are stilling pointing to it.
Anyone has similar experience or workaround?
E.g., the error would say:
ERROR: Failed to open HDFS file hdfs://ip-10-224-183-156.us-west-2.compute.internal:8020/data/combined/year=2014/month=01/day=27/hour=21/4846160764099747119--5487238769970965621_489411441_data.0
Error(255): Unknown error 255
Cancelling query ...
ERROR: Invalid or unknown query handle
Could not execute command: select account_id,
substr(cast(from_utc_timestamp(concat(year,'-',month,'-',day,' ',hour,':00:00'),'America/New_York') as string),1,10) 'date',
count(1) impressions,
sum(ck) clicks
from combined
where substr(cast(from_utc_timestamp(concat(year,'-',month,'-',day,' ',hour,':00:00'),'America/New_York') as string),1,10)>date_add('2014-01-27',-30)
and substr(cast(from_utc_timestamp(concat(year,'-',month,'-',day,' ',hour,':00:00'),'America/New_York') as string),1,10)<='2014-01-27'
group by account_id,
substr(cast(from_utc_timestamp(concat(year,'-',month,'-',day,' ',hour,':00:00'),'America/New_York') as string),1,10)