Welcome to the Cloudera Community

Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Who agreed with this topic

Impala insert overwrite breaks simultaneous queries

avatar
New Contributor

We are using Impala 1.1 with CDH 4.4, and we have a Impala/HIVE table that partitioned on year/month/day/hour.  I problem we observed was that, when running impala query to "insert overwrite ..." into various partitions, sometime it will break the simultaneous query from other terminals.  

 

Although not always, it make me suspect that Impala's insert overwrite implementation does not atomically refresh the metadata, hence there maybe a moment that old data files has been moved/removed but the metadata from impala nodes are stilling pointing to it.

 

Anyone has similar experience or workaround?

 

E.g., the error would say:

 

ERROR: Failed to open HDFS file hdfs://ip-10-224-183-156.us-west-2.compute.internal:8020/data/combined/year=2014/month=01/day=27/hour=21/4846160764099747119--5487238769970965621_489411441_data.0
Error(255): Unknown error 255
Cancelling query ...
ERROR: Invalid or unknown query handle
Could not execute command: select account_id,
substr(cast(from_utc_timestamp(concat(year,'-',month,'-',day,' ',hour,':00:00'),'America/New_York') as string),1,10) 'date',
count(1) impressions,
sum(ck) clicks
from combined
where substr(cast(from_utc_timestamp(concat(year,'-',month,'-',day,' ',hour,':00:00'),'America/New_York') as string),1,10)>date_add('2014-01-27',-30)
and substr(cast(from_utc_timestamp(concat(year,'-',month,'-',day,' ',hour,':00:00'),'America/New_York') as string),1,10)<='2014-01-27'
group by account_id,
substr(cast(from_utc_timestamp(concat(year,'-',month,'-',day,' ',hour,':00:00'),'America/New_York') as string),1,10)

 

Who agreed with this topic