Support Questions

Find answers, ask questions, and share your expertise

ACID Transactions DELETE & UPDATE Issues ... STILL Bugs are there on HDP 2.5 ?

avatar
Explorer

I'm trying to do DELETE few transactions from the table and I'm getting the following errors various changes. I already turned on ACID in Ambari. FYI, I'm using HDP 2.5 Sandbox. I don't want to try on our office systems yet.

I am using non EXTERNAL Table with

set hive.enforce.bucketing=true;

set hive.enforce.sorting=true;

set hive.exec.dynamic.partition.mode=nonstrict;

set hive.compactor.worker.threads=1;

set hive.support.concurrency=true;

CREATE TABLE IF NOT EXISTS XYZ (

<Various Columsn ...>

) CLUSTERED BY (State) into 8 buckets STORED AS ORC TBLPROPERTIES ("transactional'"="true") ;

INSERT INTO TABLE XYZ select

<VariousColumns in that Order of XYZ>

from XYZ_OLD limit 1000;

Then When I Give DELETE statement, I am getting the 2 errors based on various Table properties.

delete from XYZ where state in ('NY','GA');

ERRORS:

FAILED: SemanticException [Error 10122]: Bucketized tables do not support INSERT INTO: Table: XYZ

or

FAILED: SemanticException [Error 10297]: Attempt to do update or delete on table XYZ that does not use an AcidOutputFormat or is not bucketed

1 ACCEPTED SOLUTION

avatar
Super Collaborator

[transactional' true] - note the extra single quote there. Can you try recreating the XYZ table w/o this?

View solution in original post

8 REPLIES 8

avatar
Super Collaborator

They system doesn't think your table is an Acid table.

Can you post "describe formatted XYZ"? (Note that above you have an extra " in DDL)

do you have hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager?

Are you able to "select from XYZ"?

avatar
Explorer

Thanks Eugene for Quick Response.

Yes, DbTxnManager is set(Sorry, I missed that in original post).

Yes, Select from XYZ is working fine.

FYI, when I do simple 2/3 recordings for DELETE and Updatete with my own insert ACID is working. But I created the XYZ_OLD from a csv file for actual real time testing.

describe formated XYZ; Details are below...

-------------------------------------

# Detailed Table Information

Database: default

Owner: raj_ops

CreateTime: Mon Jan 16 08:21:09 UTC 2017

LastAccessTime: UNKNOWN

Protect Mode: None

Retention: 0

Location: hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse/XYZ

Table Type: MANAGED_TABLE

Table Parameters:

COLUMN_STATS_ACCURATE{\"BASIC_STATS\":\"true\"}

numFiles 8

numRows 1000

rawDataSize 1563286

totalSize 44374

transactional' true

transient_lastDdlTime1484557025

# Storage Information

SerDe Library: org.apache.hadoop.hive.ql.io.orc.OrcSerde

InputFormat: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat

OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat

Compressed: No

Num Buckets: 8

Bucket Columns: [state]

Sort Columns: []

Storage Desc Params:

serialization.format1

Time taken: 0.137 seconds, Fetched: 49 row(s)

avatar
Super Collaborator

[transactional' true] - note the extra single quote there. Can you try recreating the XYZ table w/o this?

avatar
Explorer

🙂 Thanks a bunch.

That single Quote is a big miss on my part and I'm checking all other big ones :). I removed that single quite and also make sure Bucketed column is not in my delete where condition statement - then it worked as expected.

So here are 2 questions.

  1. For all Updates, the limitation is Bucketed column should not be in UPDATE where condition.
  2. Why even small transaction Updates and Deletes are taking lot of time. Infact, total Transaction Table itself is running slow.

Thanks a lot for your timely response.

avatar
Super Collaborator

You cannot update any of the bucketing columns/partition columns

I'm not sure I understand the 2nd question.

Hive update/delete are meant for slowly changing data, rather than OLTP type of workload. The cost updating 10% of the rows in 100M row table is about the same as updating 0.1%. If you execute many update/delete statements, your read performance may will also be affected until you run Compaction on the table/partition.

avatar
Explorer

That makes sense ...

avatar

Hi @Naren Reddy. @Eugene Koifman is correct - ACID tables are not designed for frequent OLTP-like transactions... It is much more optimal for tables that get their deletes/updates applied every 15 minutes or longer.

avatar
Explorer

🙂 Thanks a bunch.

That single Quote is a big miss on my part and I'm checking all other big ones :). I removed that single quote and also make sure Bucketed column is not in my update SET condition statement - then it worked as expected.

So here are 2 questions.

  1. For all Updates, the limitation is Bucketed column should not be in UPDATE's SET condition.
  2. Why even small transaction Updates and Deletes are taking lot of time. Infact, total Transaction Table itself is running slow.

Thanks a lot for your timely response.