Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Tez with Transaction with Bucketing

avatar
Expert Contributor

When a table is partitioned and bucketed and Transactions enabled on it , the number of map tasks launched by TEZ = 2 , while MR jobs still launches 72 Tasks (Table is about 17Gig). if transaction is not enabled , then the query is launching Correct number of Tez tasks, If there are any hints on why this may occur, please share.

1 ACCEPTED SOLUTION

avatar
Expert Contributor

After compacting (Major compaction per partition) the tables (HDP 2.2.4.2-2) we got the right number of Tez mappers. So this appears to be a bug related to compaction.

Alex,

Yes, they are both submitting to the same queue.

The ACID Transactions is broken until further advisary. I am looking for more details of the reasons why it is broken. If you have details please send me a note.

View solution in original post

6 REPLIES 6

avatar
Master Mentor

@pbalasundaram@hortonworks.com

What version of Hive and HDP are you using?

avatar

Are the Tez jobs submitting to the same queue as MR jobs? (hive.server2.tez.default.queues, hive.server2.tez.sessions.per.default.queue)

How do Tez container settings compare with general YARN container settings?(tez.am.resource.memory.mb, tez.am.java.opts, hive.tez.container.size, hive.tez.java.opts)

avatar

@pbalasundaram@hortonworks.com, @Ryan Templeton and @ravi@hortonworks.com found same issue. I think @ravi is interacting with engineering in order to fix or workaround it.

avatar
Expert Contributor

After compacting (Major compaction per partition) the tables (HDP 2.2.4.2-2) we got the right number of Tez mappers. So this appears to be a bug related to compaction.

Alex,

Yes, they are both submitting to the same queue.

The ACID Transactions is broken until further advisary. I am looking for more details of the reasons why it is broken. If you have details please send me a note.

avatar

avatar

A few things to note:

  • If the data was inserted into the table using Hive streaming there will be lot of small files, these get compacted into large files only on compaction. If you haven't enabled compactor in the metastore then they will not be compacted, in that case you will need to issue a compaction explicitly.
  • Recommendation for customers on HDP 2.2 is to not deploy Transactions in production, there are issues with bucketing going awry, meaning you could end up with potential data corruption. There will be fixes for these in the upcoming HDP 2.3 releases so stay tuned.

Bottomline, we should refrain from deploying Transactions in HDP 2.2 releases.