Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Delete/update on hadoop partitioned table in Hive ?

Solved Go to solution

Delete/update on hadoop partitioned table in Hive ?

Expert Contributor
hive> delete from student where ods='2017_006_30' and id=1;
Usage: delete [FILE|JAR|ARCHIVE] <value> [<value>]*
Query returned non-zero code: 1, cause: null

I am getting result, however data is not getting deleted from hive table ?

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Delete/update on hadoop partitioned table in Hive ?

Expert Contributor

Finally Worked for Me and did some work around. Steps as below.

1) Create Temp table with same columns.

2) Overwrite table with required row data.

3)Drop Hive partitions and HDFS directory.

4)Insert records for respective partitions and rows.

5) verify the counts.

1) hive> select count(*) from emptable where  od='17_06_30' and ccodee=!'123';
OK
27
hive> select count(*) from emptable where  od='17_06_30' and ccodee='123';
OK
7
hive>show create table emptable_tmp; :- Note hdfs location 
2)Create table and overwrite with required partitioned data
hive> CREATE  TABLE `emptable_tmp`(
    'rowid` string,PARTITIONED BY (`od` string) 
	ROW FORMAT SERDE
     'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
   STORED AS INPUTFORMAT
     'org.apache.hadoop.mapred.SequenceFileInputFormat';
	 
hive> insert into emptable_tmp partition(od) select * from emptable where od='17_06_30' and ccodee!='123';
         Time taken for adding to write entity : 1
Partition database.emptable_tmp{od=17_06_30} stats: [numFiles=20, numRows=27,totalSize=6216,rawDataSize=5502]
OK

3) Drop Partitions from Hive and HDFS directory as well, as this is External table.
hive> alter table emptable drop partition(od='17_06_30');
Dropped the partition od=17_06_30
OK
Time taken: 0.291 seconds
HDFS partition deletion
#hdfs dfs -rm -r /hdfs/location/emptable/ods='17_06_30'
4) Insert data for that partition only.

hive> insert into emptable partition(od) select * from emptable_tmp;
Partition database.emptable{ds=17_06_30} stats: [numFiles=66, numRows=20, totalSize=5441469982, rawDataSize=]
OK
Time taken: 27.282 seconds
5) Verifying the counts on partitions and respective rows data
1) hive> select count(*) from emptable where  od='17_06_30' and ccodee=!'123';
OK
27
hive> select count(*) from emptable where  od='17_06_30' and ccodee='123';
OK
0
7 REPLIES 7
Highlighted

Re: Delete/update on hadoop partitioned table in Hive ?

Contributor

Delete/Update in Hive, There are some limitation.

Plz check this link 'https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Delete'

Re: Delete/update on hadoop partitioned table in Hive ?

Expert Contributor

thanks for reply, yep tried to delete using command as below.

hive>delete  emp_table where ods='2017_006_30'and id=1;
Usage:delete[FILE|JAR|ARCHIVE]<value>[<value>]
*Query returned non-zero code:1, cause:null

Out put got as deleted, actually rows are not deleted from table.

Hive Version:- 1.2.1+

Re: Delete/update on hadoop partitioned table in Hive ?

Contributor

Finally, I deleted a row !!!

There are lots of limitations with following link(ex: bucket, orc, set variables, ...)

https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-Limitations

set hive.support.concurrency=true;

set hive.enforce.bucketing=true;

set hive.exec.dynamic.partition.mode=nonstrict;

set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;

create table test_delete ( id int, name string )

CLUSTERED BY (id) INTO 2 BUCKETS STORED AS ORC

TBLPROPERTIES ("transactional"="true");

INSERT INTO TABLE test_delete VALUES (31, 'aaa31');

INSERT INTO TABLE test_delete VALUES (32, 'aaa32');

INSERT INTO TABLE test_delete VALUES (33, 'aaa33');

INSERT INTO TABLE test_delete VALUES (34, 'aaa34');

INSERT INTO TABLE test_delete VALUES (35, 'aaa35');

hive> select * from test_delete;

OK

31 aaa31

32 aaa32

33 aaa33

34 aaa34

35 aaa35

Time taken: 0.06 seconds, Fetched: 5 row(s)

delete from test_delete where name = 'aaa33';

hive> select * from test_delete;

OK

32 aaa32

34 aaa34

31 aaa31

35 aaa35

Time taken: 0.129 seconds, Fetched: 4 row(s)

Re: Delete/update on hadoop partitioned table in Hive ?

Expert Contributor

Looks like its not working for the partitioned tables, pleaes verify the logs.

Before Delete :- Counts 
hive> select count(*) from emptable where  ods='2017_06_30' and code!='123';
OK
12
Time taken: 32.57 seconds, Fetched: 1 row(s)<br>
Delete Command
hive> set hive.support.concurrency=true;
hive>set hive.enforce.bucketing=true;
hive>set hive.exec.dynamic.partition.mode=nonstrict;
hive>set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
hive> delete emptable where  ods='2017_06_30' and code!='123';
Usage: delete [FILE|JAR|ARCHIVE] <value> [<value>]*
Query returned non-zero code: 1, cause: null



After delete still same recrods :-
hive> select count(*) from emptable where  ods='2017_06_30' and code!='123';
OK
12
Time taken: 26.406 seconds, Fetched: 1 row(s)

Re: Delete/update on hadoop partitioned table in Hive ?

Expert Contributor

Looks like its not working for the partitioned tables, pleaes verify the logs.


Re: Delete/update on hadoop partitioned table in Hive ?

Contributor

Try this,

hive> delete emptable where ods='2017_06_30' and code!='123';

--->

hive> delete from emptable where ods='2017_06_30' and code!='123';

Re: Delete/update on hadoop partitioned table in Hive ?

Expert Contributor

Finally Worked for Me and did some work around. Steps as below.

1) Create Temp table with same columns.

2) Overwrite table with required row data.

3)Drop Hive partitions and HDFS directory.

4)Insert records for respective partitions and rows.

5) verify the counts.

1) hive> select count(*) from emptable where  od='17_06_30' and ccodee=!'123';
OK
27
hive> select count(*) from emptable where  od='17_06_30' and ccodee='123';
OK
7
hive>show create table emptable_tmp; :- Note hdfs location 
2)Create table and overwrite with required partitioned data
hive> CREATE  TABLE `emptable_tmp`(
    'rowid` string,PARTITIONED BY (`od` string) 
	ROW FORMAT SERDE
     'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
   STORED AS INPUTFORMAT
     'org.apache.hadoop.mapred.SequenceFileInputFormat';
	 
hive> insert into emptable_tmp partition(od) select * from emptable where od='17_06_30' and ccodee!='123';
         Time taken for adding to write entity : 1
Partition database.emptable_tmp{od=17_06_30} stats: [numFiles=20, numRows=27,totalSize=6216,rawDataSize=5502]
OK

3) Drop Partitions from Hive and HDFS directory as well, as this is External table.
hive> alter table emptable drop partition(od='17_06_30');
Dropped the partition od=17_06_30
OK
Time taken: 0.291 seconds
HDFS partition deletion
#hdfs dfs -rm -r /hdfs/location/emptable/ods='17_06_30'
4) Insert data for that partition only.

hive> insert into emptable partition(od) select * from emptable_tmp;
Partition database.emptable{ds=17_06_30} stats: [numFiles=66, numRows=20, totalSize=5441469982, rawDataSize=]
OK
Time taken: 27.282 seconds
5) Verifying the counts on partitions and respective rows data
1) hive> select count(*) from emptable where  od='17_06_30' and ccodee=!'123';
OK
27
hive> select count(*) from emptable where  od='17_06_30' and ccodee='123';
OK
0
Don't have an account?
Coming from Hortonworks? Activate your account here