Created 01-16-2017 08:57 PM
I'm trying to do DELETE few transactions from the table and I'm getting the following errors various changes. I already turned on ACID in Ambari. FYI, I'm using HDP 2.5 Sandbox. I don't want to try on our office systems yet.
I am using non EXTERNAL Table with
set hive.enforce.bucketing=true;
set hive.enforce.sorting=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.compactor.worker.threads=1;
set hive.support.concurrency=true;
CREATE TABLE IF NOT EXISTS XYZ (
<Various Columsn ...>
) CLUSTERED BY (State) into 8 buckets STORED AS ORC TBLPROPERTIES ("transactional'"="true") ;
INSERT INTO TABLE XYZ select
<VariousColumns in that Order of XYZ>
from XYZ_OLD limit 1000;
Then When I Give DELETE statement, I am getting the 2 errors based on various Table properties.
delete from XYZ where state in ('NY','GA');
ERRORS:
FAILED: SemanticException [Error 10122]: Bucketized tables do not support INSERT INTO: Table: XYZ
or
FAILED: SemanticException [Error 10297]: Attempt to do update or delete on table XYZ that does not use an AcidOutputFormat or is not bucketed
Created 01-16-2017 10:52 PM
[transactional' true] - note the extra single quote there. Can you try recreating the XYZ table w/o this?
Created 01-16-2017 10:18 PM
They system doesn't think your table is an Acid table.
Can you post "describe formatted XYZ"? (Note that above you have an extra " in DDL)
do you have hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager?
Are you able to "select from XYZ"?
Created 01-16-2017 10:49 PM
Thanks Eugene for Quick Response.
Yes, DbTxnManager is set(Sorry, I missed that in original post).
Yes, Select from XYZ is working fine.
FYI, when I do simple 2/3 recordings for DELETE and Updatete with my own insert ACID is working. But I created the XYZ_OLD from a csv file for actual real time testing.
describe formated XYZ; Details are below...
-------------------------------------
# Detailed Table Information
Database: default
Owner: raj_ops
CreateTime: Mon Jan 16 08:21:09 UTC 2017
LastAccessTime: UNKNOWN
Protect Mode: None
Retention: 0
Location: hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse/XYZ
Table Type: MANAGED_TABLE
Table Parameters:
COLUMN_STATS_ACCURATE{\"BASIC_STATS\":\"true\"}
numFiles 8
numRows 1000
rawDataSize 1563286
totalSize 44374
transactional' true
transient_lastDdlTime1484557025
# Storage Information
SerDe Library: org.apache.hadoop.hive.ql.io.orc.OrcSerde
InputFormat: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
Compressed: No
Num Buckets: 8
Bucket Columns: [state]
Sort Columns: []
Storage Desc Params:
serialization.format1
Time taken: 0.137 seconds, Fetched: 49 row(s)
Created 01-16-2017 10:52 PM
[transactional' true] - note the extra single quote there. Can you try recreating the XYZ table w/o this?
Created 01-16-2017 11:31 PM
🙂 Thanks a bunch.
That single Quote is a big miss on my part and I'm checking all other big ones :). I removed that single quite and also make sure Bucketed column is not in my delete where condition statement - then it worked as expected.
So here are 2 questions.
Thanks a lot for your timely response.
Created 01-16-2017 11:38 PM
You cannot update any of the bucketing columns/partition columns
I'm not sure I understand the 2nd question.
Hive update/delete are meant for slowly changing data, rather than OLTP type of workload. The cost updating 10% of the rows in 100M row table is about the same as updating 0.1%. If you execute many update/delete statements, your read performance may will also be affected until you run Compaction on the table/partition.
Created 01-16-2017 11:50 PM
That makes sense ...
Created 01-17-2017 06:49 PM
Hi @Naren Reddy. @Eugene Koifman is correct - ACID tables are not designed for frequent OLTP-like transactions... It is much more optimal for tables that get their deletes/updates applied every 15 minutes or longer.
Created 01-16-2017 11:30 PM
🙂 Thanks a bunch.
That single Quote is a big miss on my part and I'm checking all other big ones :). I removed that single quote and also make sure Bucketed column is not in my update SET condition statement - then it worked as expected.
So here are 2 questions.
Thanks a lot for your timely response.