Created 08-06-2017 11:32 PM
hive> delete from student where ods='2017_006_30' and id=1; Usage: delete [FILE|JAR|ARCHIVE] <value> [<value>]* Query returned non-zero code: 1, cause: null
I am getting result, however data is not getting deleted from hive table ?
Created 08-08-2017 10:32 AM
Finally Worked for Me and did some work around. Steps as below.
1) Create Temp table with same columns.
2) Overwrite table with required row data.
3)Drop Hive partitions and HDFS directory.
4)Insert records for respective partitions and rows.
5) verify the counts.
1) hive> select count(*) from emptable where od='17_06_30' and ccodee=!'123'; OK 27 hive> select count(*) from emptable where od='17_06_30' and ccodee='123'; OK 7 hive>show create table emptable_tmp; :- Note hdfs location
2)Create table and overwrite with required partitioned data hive> CREATE TABLE `emptable_tmp`( 'rowid` string,PARTITIONED BY (`od` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.SequenceFileInputFormat'; hive> insert into emptable_tmp partition(od) select * from emptable where od='17_06_30' and ccodee!='123'; Time taken for adding to write entity : 1 Partition database.emptable_tmp{od=17_06_30} stats: [numFiles=20, numRows=27,totalSize=6216,rawDataSize=5502] OK
3) Drop Partitions from Hive and HDFS directory as well, as this is External table. hive> alter table emptable drop partition(od='17_06_30'); Dropped the partition od=17_06_30 OK Time taken: 0.291 seconds HDFS partition deletion #hdfs dfs -rm -r /hdfs/location/emptable/ods='17_06_30'
4) Insert data for that partition only. hive> insert into emptable partition(od) select * from emptable_tmp; Partition database.emptable{ds=17_06_30} stats: [numFiles=66, numRows=20, totalSize=5441469982, rawDataSize=] OK Time taken: 27.282 seconds
5) Verifying the counts on partitions and respective rows data 1) hive> select count(*) from emptable where od='17_06_30' and ccodee=!'123'; OK 27 hive> select count(*) from emptable where od='17_06_30' and ccodee='123'; OK 0
Created 08-07-2017 02:37 AM
Delete/Update in Hive, There are some limitation.
Plz check this link 'https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Delete'
Created 08-07-2017 04:51 AM
thanks for reply, yep tried to delete using command as below.
hive>delete emp_table where ods='2017_006_30'and id=1; Usage:delete[FILE|JAR|ARCHIVE]<value>[<value>] *Query returned non-zero code:1, cause:null
Out put got as deleted, actually rows are not deleted from table.
Hive Version:- 1.2.1+
Created 08-07-2017 07:13 AM
Finally, I deleted a row !!!
There are lots of limitations with following link(ex: bucket, orc, set variables, ...)
https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-Limitations
set hive.support.concurrency=true;
set hive.enforce.bucketing=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
create table test_delete ( id int, name string )
CLUSTERED BY (id) INTO 2 BUCKETS STORED AS ORC
TBLPROPERTIES ("transactional"="true");
INSERT INTO TABLE test_delete VALUES (31, 'aaa31');
INSERT INTO TABLE test_delete VALUES (32, 'aaa32');
INSERT INTO TABLE test_delete VALUES (33, 'aaa33');
INSERT INTO TABLE test_delete VALUES (34, 'aaa34');
INSERT INTO TABLE test_delete VALUES (35, 'aaa35');
hive> select * from test_delete;
OK
31 aaa31
32 aaa32
33 aaa33
34 aaa34
35 aaa35
Time taken: 0.06 seconds, Fetched: 5 row(s)
delete from test_delete where name = 'aaa33';
hive> select * from test_delete;
OK
32 aaa32
34 aaa34
31 aaa31
35 aaa35
Time taken: 0.129 seconds, Fetched: 4 row(s)
Created 08-08-2017 05:02 AM
Looks like its not working for the partitioned tables, pleaes verify the logs.
Before Delete :- Counts hive> select count(*) from emptable where ods='2017_06_30' and code!='123'; OK 12 Time taken: 32.57 seconds, Fetched: 1 row(s)<br>
Delete Command hive> set hive.support.concurrency=true; hive>set hive.enforce.bucketing=true; hive>set hive.exec.dynamic.partition.mode=nonstrict; hive>set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; hive> delete emptable where ods='2017_06_30' and code!='123'; Usage: delete [FILE|JAR|ARCHIVE] <value> [<value>]* Query returned non-zero code: 1, cause: null
After delete still same recrods :- hive> select count(*) from emptable where ods='2017_06_30' and code!='123'; OK 12 Time taken: 26.406 seconds, Fetched: 1 row(s)
Created 08-08-2017 06:30 AM
Looks like its not working for the partitioned tables, pleaes verify the logs.
Created 08-08-2017 09:16 AM
Try this,
hive> delete emptable where ods='2017_06_30' and code!='123';
--->
hive> delete from emptable where ods='2017_06_30' and code!='123';
Created 08-08-2017 10:32 AM
Finally Worked for Me and did some work around. Steps as below.
1) Create Temp table with same columns.
2) Overwrite table with required row data.
3)Drop Hive partitions and HDFS directory.
4)Insert records for respective partitions and rows.
5) verify the counts.
1) hive> select count(*) from emptable where od='17_06_30' and ccodee=!'123'; OK 27 hive> select count(*) from emptable where od='17_06_30' and ccodee='123'; OK 7 hive>show create table emptable_tmp; :- Note hdfs location
2)Create table and overwrite with required partitioned data hive> CREATE TABLE `emptable_tmp`( 'rowid` string,PARTITIONED BY (`od` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.SequenceFileInputFormat'; hive> insert into emptable_tmp partition(od) select * from emptable where od='17_06_30' and ccodee!='123'; Time taken for adding to write entity : 1 Partition database.emptable_tmp{od=17_06_30} stats: [numFiles=20, numRows=27,totalSize=6216,rawDataSize=5502] OK
3) Drop Partitions from Hive and HDFS directory as well, as this is External table. hive> alter table emptable drop partition(od='17_06_30'); Dropped the partition od=17_06_30 OK Time taken: 0.291 seconds HDFS partition deletion #hdfs dfs -rm -r /hdfs/location/emptable/ods='17_06_30'
4) Insert data for that partition only. hive> insert into emptable partition(od) select * from emptable_tmp; Partition database.emptable{ds=17_06_30} stats: [numFiles=66, numRows=20, totalSize=5441469982, rawDataSize=] OK Time taken: 27.282 seconds
5) Verifying the counts on partitions and respective rows data 1) hive> select count(*) from emptable where od='17_06_30' and ccodee=!'123'; OK 27 hive> select count(*) from emptable where od='17_06_30' and ccodee='123'; OK 0