Hello @Bala Vignesh N V thank you for your reply and my excuses for the insufficient information but I missed out to mention that in our case we have this behavior on HDFS not S3. Also I really want to avoid such a union because our inserts are already invoking huge select with join so this work around would exhaust our cluster resources and would make the ETL process really slow. Are you aware of such a bug? or it could be a configuration issue as well?
@Alexandros Biratsis i believe that you are not using Insert Overwrite when inserting the incremental records into the target. Assuming that its weird how the data is being overridden.
For the union part --> If you wanted to avoid union then you may have to may have to perform left join between the incremental data and target to apply some transformations ( assuming that you are performing SCD type 1). If you wanted to just append the data then insert the incremental data through multiple queries into the target one by one. By if you are inserting the data multiple times then the no of jobs will be more which be more or less equal to performing union over it. Sorry for the late reply.
Hello @Bala Vignesh N V I am sorry for the late reply although I was too busy lately and not following the forum. So finally we solved this by making the table internal, keeping the TextFile format and storing data under default Hive directory. The definition of table look like this at the moment:
create table test1(c1 int, c2 int) CLUSTERED BY(c1) SORTED BY(c1) INTO 4 BUCKETS ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE;
So every time I was specifying the LOCATION Hive would overwrite the data but when changing to internal everything worked fine under the /apps/hive/warehouse/ directory