Created on 04-17-2017 06:10 PM - edited 08-17-2019 10:31 PM
We have been facing an issue while we are trying to insert data from Managed to External Table S3. Each time when we have a new data in Managed Table, we need to append that new data into our external table S3. Instead of appending, it is replacing old data with newly received data (Old data are over written). I have come across similar JIRA thread and that patch is for Apache Hive (Link at the bottom). Since we are on HDP, can anyone help me out with this?
Below are the Versions:
create external table tests3prd (c1 string, c2 string) location 's3a://testdata/test/tests3prd/'; create table testint (c1 string,c2 string); insert into testint values (1,2);
insert into tests3prd select * from testint; (2 times)
When I re-insert the same values 1,2 , it overwrites the existing row and replaces with the new record.
Here is the S3 external files where each time *0000_0 is overwritten instead a new copy or serial addition.
PS: Jira Thread : https://issues.apache.org/jira/browse/HIVE-15199
Created 04-17-2017 06:10 PM
Created 05-03-2017 11:53 AM
This is fixed in Hortonworks Cloud. Is this on-prime cluster or Hortonworks Cloud?.
Created 05-04-2017 07:41 AM
Thanks Rajesh for the reply, this is on AWS EC2 instance installed with HDP 2.5.3 ! Let me know if you know any workaround for the same !
Created 08-18-2017 07:19 AM
You can try below workaround.
-Create merged temp table (old data + new data) using union all
-Insert overwrite the final table with merged data
-Drop temp table .
Created 08-18-2017 07:28 AM
@Rajesh Balamohan We are also facing the similar issue. Please let us know if there are any fixes available or any plan for fix in future releases for HDP.
HDP 2.5.3 (EC2 Instances), Hive 1.2.1
Created 12-29-2017 03:12 PM
It should be fixed in the current release of HDP 2.6. If not, please put in an official support request or JIRA ticket.
Created 12-29-2017 02:30 PM
Hello guys, I have a similar issue but with external table onto HDFS. Is there any solution on this so far? we are using HDP-18.104.22.168 and here is how my table looks like:
create external table test1(c1 int, c2 int) CLUSTERED BY(c1) SORTED BY(c1) INTO 4 BUCKETS ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE LOCATION '/user/companies/';
Currently the data is always overwritten
Created 12-29-2017 03:17 PM
what version of Hive are using. 1.2 or 2.1?
Is there permissions for reading on that HDFS location? S3?
By default location is
You need to specify that it is stored in S3 and has permissions. By default it assumes the path is in HDFS.
Created 01-09-2018 08:01 PM
Hello guys, any update on this? I remind you that in my case it occurs on HDFS not S3.
Thank you in advance
Created 01-11-2018 05:52 PM
In case that someone will face the same problem we solved this by making the table internal, keeping the TextFile format and storing data under default Hive directory. The definition of table look like this at the moment:
create table test1(c1 int, c2 int) CLUSTERED BY(c1) SORTED BY(c1) INTO 4 BUCKETS ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE;