Support Questions

Find answers, ask questions, and share your expertise

Delete/update on hadoop partitioned table in Hive ?

avatar
Expert Contributor
hive> delete from student where ods='2017_006_30' and id=1;
Usage: delete [FILE|JAR|ARCHIVE] <value> [<value>]*
Query returned non-zero code: 1, cause: null

I am getting result, however data is not getting deleted from hive table ?

1 ACCEPTED SOLUTION

avatar
Expert Contributor

Finally Worked for Me and did some work around. Steps as below.

1) Create Temp table with same columns.

2) Overwrite table with required row data.

3)Drop Hive partitions and HDFS directory.

4)Insert records for respective partitions and rows.

5) verify the counts.

1) hive> select count(*) from emptable where  od='17_06_30' and ccodee=!'123';
OK
27
hive> select count(*) from emptable where  od='17_06_30' and ccodee='123';
OK
7
hive>show create table emptable_tmp; :- Note hdfs location 
2)Create table and overwrite with required partitioned data
hive> CREATE  TABLE `emptable_tmp`(
    'rowid` string,PARTITIONED BY (`od` string) 
	ROW FORMAT SERDE
     'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
   STORED AS INPUTFORMAT
     'org.apache.hadoop.mapred.SequenceFileInputFormat';
	 
hive> insert into emptable_tmp partition(od) select * from emptable where od='17_06_30' and ccodee!='123';
         Time taken for adding to write entity : 1
Partition database.emptable_tmp{od=17_06_30} stats: [numFiles=20, numRows=27,totalSize=6216,rawDataSize=5502]
OK

3) Drop Partitions from Hive and HDFS directory as well, as this is External table.
hive> alter table emptable drop partition(od='17_06_30');
Dropped the partition od=17_06_30
OK
Time taken: 0.291 seconds
HDFS partition deletion
#hdfs dfs -rm -r /hdfs/location/emptable/ods='17_06_30'
4) Insert data for that partition only.

hive> insert into emptable partition(od) select * from emptable_tmp;
Partition database.emptable{ds=17_06_30} stats: [numFiles=66, numRows=20, totalSize=5441469982, rawDataSize=]
OK
Time taken: 27.282 seconds
5) Verifying the counts on partitions and respective rows data
1) hive> select count(*) from emptable where  od='17_06_30' and ccodee=!'123';
OK
27
hive> select count(*) from emptable where  od='17_06_30' and ccodee='123';
OK
0

View solution in original post

7 REPLIES 7

avatar
Rising Star

Delete/Update in Hive, There are some limitation.

Plz check this link 'https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Delete'

avatar
Expert Contributor

thanks for reply, yep tried to delete using command as below.

hive>delete  emp_table where ods='2017_006_30'and id=1;
Usage:delete[FILE|JAR|ARCHIVE]<value>[<value>]
*Query returned non-zero code:1, cause:null

Out put got as deleted, actually rows are not deleted from table.

Hive Version:- 1.2.1+

avatar
Rising Star

Finally, I deleted a row !!!

There are lots of limitations with following link(ex: bucket, orc, set variables, ...)

https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-Limitations

set hive.support.concurrency=true;

set hive.enforce.bucketing=true;

set hive.exec.dynamic.partition.mode=nonstrict;

set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;

create table test_delete ( id int, name string )

CLUSTERED BY (id) INTO 2 BUCKETS STORED AS ORC

TBLPROPERTIES ("transactional"="true");

INSERT INTO TABLE test_delete VALUES (31, 'aaa31');

INSERT INTO TABLE test_delete VALUES (32, 'aaa32');

INSERT INTO TABLE test_delete VALUES (33, 'aaa33');

INSERT INTO TABLE test_delete VALUES (34, 'aaa34');

INSERT INTO TABLE test_delete VALUES (35, 'aaa35');

hive> select * from test_delete;

OK

31 aaa31

32 aaa32

33 aaa33

34 aaa34

35 aaa35

Time taken: 0.06 seconds, Fetched: 5 row(s)

delete from test_delete where name = 'aaa33';

hive> select * from test_delete;

OK

32 aaa32

34 aaa34

31 aaa31

35 aaa35

Time taken: 0.129 seconds, Fetched: 4 row(s)

avatar
Expert Contributor

Looks like its not working for the partitioned tables, pleaes verify the logs.

Before Delete :- Counts 
hive> select count(*) from emptable where  ods='2017_06_30' and code!='123';
OK
12
Time taken: 32.57 seconds, Fetched: 1 row(s)<br>
Delete Command
hive> set hive.support.concurrency=true;
hive>set hive.enforce.bucketing=true;
hive>set hive.exec.dynamic.partition.mode=nonstrict;
hive>set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
hive> delete emptable where  ods='2017_06_30' and code!='123';
Usage: delete [FILE|JAR|ARCHIVE] <value> [<value>]*
Query returned non-zero code: 1, cause: null



After delete still same recrods :-
hive> select count(*) from emptable where  ods='2017_06_30' and code!='123';
OK
12
Time taken: 26.406 seconds, Fetched: 1 row(s)

avatar
Expert Contributor

Looks like its not working for the partitioned tables, pleaes verify the logs.


avatar
Rising Star

Try this,

hive> delete emptable where ods='2017_06_30' and code!='123';

--->

hive> delete from emptable where ods='2017_06_30' and code!='123';

avatar
Expert Contributor

Finally Worked for Me and did some work around. Steps as below.

1) Create Temp table with same columns.

2) Overwrite table with required row data.

3)Drop Hive partitions and HDFS directory.

4)Insert records for respective partitions and rows.

5) verify the counts.

1) hive> select count(*) from emptable where  od='17_06_30' and ccodee=!'123';
OK
27
hive> select count(*) from emptable where  od='17_06_30' and ccodee='123';
OK
7
hive>show create table emptable_tmp; :- Note hdfs location 
2)Create table and overwrite with required partitioned data
hive> CREATE  TABLE `emptable_tmp`(
    'rowid` string,PARTITIONED BY (`od` string) 
	ROW FORMAT SERDE
     'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
   STORED AS INPUTFORMAT
     'org.apache.hadoop.mapred.SequenceFileInputFormat';
	 
hive> insert into emptable_tmp partition(od) select * from emptable where od='17_06_30' and ccodee!='123';
         Time taken for adding to write entity : 1
Partition database.emptable_tmp{od=17_06_30} stats: [numFiles=20, numRows=27,totalSize=6216,rawDataSize=5502]
OK

3) Drop Partitions from Hive and HDFS directory as well, as this is External table.
hive> alter table emptable drop partition(od='17_06_30');
Dropped the partition od=17_06_30
OK
Time taken: 0.291 seconds
HDFS partition deletion
#hdfs dfs -rm -r /hdfs/location/emptable/ods='17_06_30'
4) Insert data for that partition only.

hive> insert into emptable partition(od) select * from emptable_tmp;
Partition database.emptable{ds=17_06_30} stats: [numFiles=66, numRows=20, totalSize=5441469982, rawDataSize=]
OK
Time taken: 27.282 seconds
5) Verifying the counts on partitions and respective rows data
1) hive> select count(*) from emptable where  od='17_06_30' and ccodee=!'123';
OK
27
hive> select count(*) from emptable where  od='17_06_30' and ccodee='123';
OK
0