Expert Contributor
Posts: 109
Registered: ‎05-19-2016

Update existing data with new data in Hive

[ Edited ]

I read that Cloudera does not support ACID operations in hive and anyway that's only supported with ORC format. 


I import data from an API using updated_at field. Now, I do not want my table to have multiple entries with the same id. Also, data for a partcular ID could be updated very frequently so it would result in a lot of duplication. I can access hive from R using Rhive but updates  and delete ooperations do not seem to work. 


I tried this in the query editor:



set = 10000000;
set = true;
set hive.enforce.bucketing = true;
set hive.exec.dynamic.partition.mode = nonstrict;
set hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
set hive.compactor.initiator.on = true;
set hive.compactor.worker.threads = 1 ;

DELETE FROM employee WHERE eid in (1, 4, 5)


It fails with this:



Error while compiling statement: FAILED: SemanticException [Error 10294]: Attempt to do update or delete using transaction manager that does not support these operations


The plan was to delete all the rows already there in hive which were imported as the updated rows from the API. 


What alternatives do I have? I cannot afford that level of duplication for every id so I need to find a way to delete or update data in hive.


`INSERT OVERWRITE in PARTITIONS` can't help since I import only updated data. Can't overwrite the entire partition with it.


I can use exact same queries through R in hive that work in hive **

Expert Contributor
Posts: 162
Registered: ‎09-29-2014

Re: Update existing data with new data in Hive

hive is not sutiable to do insert,update, delete, even currently hive already support insert , but it's not a good idea to use it in production env since it's very slow.


there are two solution for you to do that:


1. insert (i don't suggest this solution)


table construct like :  id , name

record like: 1,  "yourname"

if you want to change column name to "hisname", you can do below;


alter able xxx add column record-index int;


insert into table values(1,'hisname',2)\

then your record will be like 1, "yourname" 1      1,"histname",2


select * from  table where record-index = (select  max(record-indx) from table where id=1  


hope you understand the above solution.


2. hbase 


don't put your data into hive directly, use hase,  then you can delete,update,insert , this is a easy way  if you have requirement to change hive records.



Expert Contributor
Posts: 66
Registered: ‎12-24-2015

Re: Update existing data with new data in Hive

Add below properties to Hive Client Advanced Configuration Snippet (Safety Valve) for hive-site.xml. Gateway Default Group. 



I also tried to add it another places but it didnot work for me.


1. Hive Service Advanced Configuration Snippet (Safety Valve) for hive-site.xml
Hive (Service-Wide) --- Didn't work
2. Hive Metastore Server Advanced Configuration Snippet (Safety Valve) for hive-site.xml
Hive Metastore Server Default Group --- Didn't work.


I found update & delete were firing map-reduce statement so quite slow. One row update & delete took 21 seconds.




New Contributor
Posts: 5
Registered: ‎08-11-2017

Re: Update existing data with new data in Hive



I had to setup the properties in "Hive Service Advanced Configuration Snippet (Safety Valve) for hive-site.xml", otherwise, it wouldn't work.