I am trying to delete some of the rows from my hive table which has partitions. This is what I did.
delete from <table_name> where <condition>;
However, I am getting following error.
FAILED: SemanticException [Error 10294]: Attempt to do update or delete using transaction manager that does not support these operations.
Please anyone suggest why the query is not working.
Thanks in advance.
1.Need to enable ACID Property
2. table need to be in ORC format and bucket enabled.
Check table definition using "SHOW CREATE TABLE", to know if it satisfies above conditions.
Table is stored as TEXTFILE. It is not in ORC format or bucket enable. For such case is there a way around to delete the rows from such tables
If your table is text format then the table won't have any delete/update capabilities.
The work around for this case as follows,
If your table is partitioned:
1.Then select the partition that you want to delete rows from and make sure any new data is not writing into this partition.
2.Take the specific partition data into temp table
hive> create table <db_name>.<temp_table_name> as select * from <db_name>.<partition_table_name> where <partition_field_name>="<desired_partition_value>";
3.Overwrite the same partition by excluding the unnecessary rows
hive> insert overwrite <db_name>.<partition_table_name> partition(<partition_field_name>) select * from <db_name>.<temp_table_name> where <field_name> not in (<values_to_exclude>);
4. once you make sure that the data is correct then drop the temp table
hive> drop table <db_name>.<temp_table_name>;
These are the steps we need to follow for deleting specific rows in case of non-transactional table.
In addition if you are having non partitioned table then we need to get full dump of existing(target) table into temp table and overwrite the target table by excluding the unnecessary rows from the temp table and most important until this process is finished make sure you are not writing any new data into target table.
Even hive supports select and overwrite the same table at same time but any wrong queries will lead to loose data completely so it's better to use temp table in place and drop the table when we make sure the data is correct.
insert overwrite table <db_name>.<partition_table_name> partition(<partition_field_name>) select * from <db_name>.<partition_table_name> where <field_name> not in (<values_to_exclude>);