Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Impala insert overwrite breaks simultaneous queries

Impala insert overwrite breaks simultaneous queries

New Contributor

We are using Impala 1.1 with CDH 4.4, and we have a Impala/HIVE table that partitioned on year/month/day/hour.  I problem we observed was that, when running impala query to "insert overwrite ..." into various partitions, sometime it will break the simultaneous query from other terminals.  

 

Although not always, it make me suspect that Impala's insert overwrite implementation does not atomically refresh the metadata, hence there maybe a moment that old data files has been moved/removed but the metadata from impala nodes are stilling pointing to it.

 

Anyone has similar experience or workaround?

 

E.g., the error would say:

 

ERROR: Failed to open HDFS file hdfs://ip-10-224-183-156.us-west-2.compute.internal:8020/data/combined/year=2014/month=01/day=27/hour=21/4846160764099747119--5487238769970965621_489411441_data.0
Error(255): Unknown error 255
Cancelling query ...
ERROR: Invalid or unknown query handle
Could not execute command: select account_id,
substr(cast(from_utc_timestamp(concat(year,'-',month,'-',day,' ',hour,':00:00'),'America/New_York') as string),1,10) 'date',
count(1) impressions,
sum(ck) clicks
from combined
where substr(cast(from_utc_timestamp(concat(year,'-',month,'-',day,' ',hour,':00:00'),'America/New_York') as string),1,10)>date_add('2014-01-27',-30)
and substr(cast(from_utc_timestamp(concat(year,'-',month,'-',day,' ',hour,':00:00'),'America/New_York') as string),1,10)<='2014-01-27'
group by account_id,
substr(cast(from_utc_timestamp(concat(year,'-',month,'-',day,' ',hour,':00:00'),'America/New_York') as string),1,10)

 

5 REPLIES 5
Highlighted

Re: Impala insert overwrite breaks simultaneous queries

Contributor

Are you concurrently writing to the same partitions or is every query targetting a different partition?

Re: Impala insert overwrite breaks simultaneous queries

New Contributor

Hi there, 

I'm actually having exactly the same proble, during the insert overwrite if I run a select on the partition i'm overriding the query i get "Failed to open HDFS file" any update about that ?

Re: Impala insert overwrite breaks simultaneous queries

New Contributor

We are also facing a similar issue. We insert into a impala table from a lot of other small tables every 5 minutes. So, the main table has a lot of small files and it is effecting the impala performance. So, we are running a insert overwrite into the table by doing a select on the same table every 6 hours. If there are any queries already running during that insert overwrite statement then they all fail ,with file not found error. 

 

Is there any update or workaround for this issue.

Impala version : v2.7.0-cdh5.9.1 

 

Thanks.

Re: Impala insert overwrite breaks simultaneous queries

Expert Contributor

Impala does not support transactions so altering data while reading it will lead to these conflicts. For a workaround I recommend having a look at this blog post: https://blog.cloudera.com/blog/2015/11/how-to-ingest-and-query-fast-data-with-impala-without-kudu/

 

 

Re: Impala insert overwrite breaks simultaneous queries

New Contributor
Thanks, Lars. Will take a look at that.
Don't have an account?
Coming from Hortonworks? Activate your account here