We have parquet tables in which we have lots of inserts during the day. Those tables are partitioned by date. Since impala makes new file for every insert, we want to be able to "compress" all files from one day in one unique file.
I tried to use insert overwrite statement like this
insert overwrite table partition(kol) select from table where kol between limit1 and limit2
It worked for one test table.
I made few more test tables, one of them had the same definition as the first one. And it just didin't work for these tables.(?) I would not get one insert file on hdfs for partitiones I have overwritten.
I've tried to do refresh and compute stats on tables, but it didn't work either.