Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Hive ACID table and low performance : select with just one mapper

avatar
New Member

,

When I create the table with this configuration :

<code>set hive.support.concurrency=true;
set hive.enforce.bucketing=true;
set hive.compactor.initiator.on=true;
set hive.compactor.worker.threads=1;

CREATE TABLE IF NOT EXISTS falcon_alimentation.XX(....) CLUSTERED BY(key) into 5 BUCKETS STORED AS ORC TBLPROPERTIES ('transactional'='true');

I load data :

<code>INSERT INTO TABLE falcon_alimentation_socle.XX PARTITION(annee,mois,jour) select * from falcon_alimentation_socle_prod.XXX limit 1000000;

When I try to make a select count(x), Tez generates 7 mappers :

<code>--------------------------------------------------------------------------------
        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------
Map 1 ..........   SUCCEEDED      7          7        0        0       0       0
Reducer 2 ......   SUCCEEDED      1          1        0        0       0       0
--------------------------------------------------------------------------------
VERTICES: 02/02  [==========================>>] 100%  ELAPSED TIME: 5.64 s
-------------------------------------------------------------------------------

But I need an ACID Table, I modidy the previously configuration with :

<code>set hive.support.concurrency=true;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;

When I try to execute select count, why Tez generates just one mapper ?!!

<code>--------------------------------------------------------------------------------
        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------
Map 1 ..........   SUCCEEDED      1          1        0        0       0       0
Reducer 2 ......   SUCCEEDED      1          1        0        0       0       0
--------------------------------------------------------------------------------
VERTICES: 02/02  [==========================>>] 100%  ELAPSED TIME: 7.26 s
--------------------------------------------------------------------------------

Do you have an idea ?

1 ACCEPTED SOLUTION

avatar

May be you are hitting the following bug.

https://issues.apache.org/jira/browse/HIVE-9977

And, Probably running the compaction manually on the table will get you more number of tez mappers.

View solution in original post

5 REPLIES 5

avatar

May be you are hitting the following bug.

https://issues.apache.org/jira/browse/HIVE-9977

And, Probably running the compaction manually on the table will get you more number of tez mappers.

avatar
Super Collaborator

The relevant bug is https://issues.apache.org/jira/browse/HIVE-13821. Triggering compaction is the right solution.

avatar
New Member

Hello,

I ran compaction manually and it's works. I will try to update Hive.

Thx Lot.

avatar
New Member

Hi,

I verified the code and the two bugs are correctly deployed on our Hortonworks 2.3.4.7 plateform.

So I don't understand why the compaction doesn't run automaticaly under the paritions and why on delta files the mappers are always blocked to one.

Sometimes when I run a compaction manually with ALTER TABLE ..., the compaction is not run and un message is returned : No delta files found to compact in hdfs:// ... /test39/ annee=2014/ mois=10.

In this parition, I have the folders below : delta_0014664_0014664 delta_0014720_0014720

Do you have an idea ?

avatar
New Member

I resolved my problem. One transaction was still opened and all delta files with a transaction number upper wasn't processed.

The code snippet which manages this case :

/** * Transform a {@link org.apache.hadoop.hive.metastore.api.GetOpenTxnsInfoResponse} to a * {@link org.apache.hadoop.hive.common.ValidTxnList}. This assumes that the caller intends to * compact the files, and thus treats only open transactions as invalid. Additionally any * txnId > highestOpenTxnId is also invalid. This is avoid creating something like * delta_17_120 where txnId 80, for example, is still open. * @param txns txn list from the metastore * @return a valid txn list. */ public static ValidTxnList createValidCompactTxnList(GetOpenTxnsInfoResponse txns) { //todo: this could be more efficient: using select min(txn_id) from TXNS where txn_state=" + // quoteChar(TXN_OPEN) to compute compute HWM... long highWater = txns.getTxn_high_water_mark(); long minOpenTxn = Long.MAX_VALUE; long[] exceptions = new long[txns.getOpen_txnsSize()]; int i = 0; for (TxnInfo txn : txns.getOpen_txns()) { if (txn.getState() == TxnState.OPEN) minOpenTxn = Math.min(minOpenTxn, txn.getId()); exceptions[i++] = txn.getId();//todo: only add Aborted }//remove all exceptions < minOpenTxn highWater = minOpenTxn == Long.MAX_VALUE ? highWater : minOpenTxn - 1; return new ValidCompactorTxnList(exceptions, -1, highWater); }