Reply
New Contributor
Posts: 1
Registered: ‎01-27-2014

Impala insert overwrite breaks simultaneous queries

We are using Impala 1.1 with CDH 4.4, and we have a Impala/HIVE table that partitioned on year/month/day/hour.  I problem we observed was that, when running impala query to "insert overwrite ..." into various partitions, sometime it will break the simultaneous query from other terminals.  

 

Although not always, it make me suspect that Impala's insert overwrite implementation does not atomically refresh the metadata, hence there maybe a moment that old data files has been moved/removed but the metadata from impala nodes are stilling pointing to it.

 

Anyone has similar experience or workaround?

 

E.g., the error would say:

 

ERROR: Failed to open HDFS file hdfs://ip-10-224-183-156.us-west-2.compute.internal:8020/data/combined/year=2014/month=01/day=27/hour=21/4846160764099747119--5487238769970965621_489411441_data.0
Error(255): Unknown error 255
Cancelling query ...
ERROR: Invalid or unknown query handle
Could not execute command: select account_id,
substr(cast(from_utc_timestamp(concat(year,'-',month,'-',day,' ',hour,':00:00'),'America/New_York') as string),1,10) 'date',
count(1) impressions,
sum(ck) clicks
from combined
where substr(cast(from_utc_timestamp(concat(year,'-',month,'-',day,' ',hour,':00:00'),'America/New_York') as string),1,10)>date_add('2014-01-27',-30)
and substr(cast(from_utc_timestamp(concat(year,'-',month,'-',day,' ',hour,':00:00'),'America/New_York') as string),1,10)<='2014-01-27'
group by account_id,
substr(cast(from_utc_timestamp(concat(year,'-',month,'-',day,' ',hour,':00:00'),'America/New_York') as string),1,10)

 

Cloudera Employee
Posts: 27
Registered: ‎09-27-2013

Re: Impala insert overwrite breaks simultaneous queries

Are you concurrently writing to the same partitions or is every query targetting a different partition?

New Contributor
Posts: 4
Registered: ‎09-11-2014

Re: Impala insert overwrite breaks simultaneous queries

Hi there, 

I'm actually having exactly the same proble, during the insert overwrite if I run a select on the partition i'm overriding the query i get "Failed to open HDFS file" any update about that ?

New Contributor
Posts: 2
Registered: ‎05-17-2017

Re: Impala insert overwrite breaks simultaneous queries

[ Edited ]

We are also facing a similar issue. We insert into a impala table from a lot of other small tables every 5 minutes. So, the main table has a lot of small files and it is effecting the impala performance. So, we are running a insert overwrite into the table by doing a select on the same table every 6 hours. If there are any queries already running during that insert overwrite statement then they all fail ,with file not found error. 

 

Is there any update or workaround for this issue.

Impala version : v2.7.0-cdh5.9.1 

 

Thanks.

Cloudera Employee
Posts: 59
Registered: ‎12-07-2015

Re: Impala insert overwrite breaks simultaneous queries

Impala does not support transactions so altering data while reading it will lead to these conflicts. For a workaround I recommend having a look at this blog post: https://blog.cloudera.com/blog/2015/11/how-to-ingest-and-query-fast-data-with-impala-without-kudu/

 

 

New Contributor
Posts: 2
Registered: ‎05-17-2017

Re: Impala insert overwrite breaks simultaneous queries

Thanks, Lars. Will take a look at that.
Announcements