Support Questions

Find answers, ask questions, and share your expertise

Hive ACID table and low performance : select with just one mapper

avatar
Explorer

,

When I create the table with this configuration :

<code>set hive.support.concurrency=true;
set hive.enforce.bucketing=true;
set hive.compactor.initiator.on=true;
set hive.compactor.worker.threads=1;

CREATE TABLE IF NOT EXISTS falcon_alimentation.XX(....) CLUSTERED BY(key) into 5 BUCKETS STORED AS ORC TBLPROPERTIES ('transactional'='true');

I load data :

<code>INSERT INTO TABLE falcon_alimentation_socle.XX PARTITION(annee,mois,jour) select * from falcon_alimentation_socle_prod.XXX limit 1000000;

When I try to make a select count(x), Tez generates 7 mappers :

<code>--------------------------------------------------------------------------------
        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------
Map 1 ..........   SUCCEEDED      7          7        0        0       0       0
Reducer 2 ......   SUCCEEDED      1          1        0        0       0       0
--------------------------------------------------------------------------------
VERTICES: 02/02  [==========================>>] 100%  ELAPSED TIME: 5.64 s
-------------------------------------------------------------------------------

But I need an ACID Table, I modidy the previously configuration with :

<code>set hive.support.concurrency=true;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;

When I try to execute select count, why Tez generates just one mapper ?!!

<code>--------------------------------------------------------------------------------
        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------
Map 1 ..........   SUCCEEDED      1          1        0        0       0       0
Reducer 2 ......   SUCCEEDED      1          1        0        0       0       0
--------------------------------------------------------------------------------
VERTICES: 02/02  [==========================>>] 100%  ELAPSED TIME: 7.26 s
--------------------------------------------------------------------------------

Do you have an idea ?

1 ACCEPTED SOLUTION

avatar

May be you are hitting the following bug.

https://issues.apache.org/jira/browse/HIVE-9977

And, Probably running the compaction manually on the table will get you more number of tez mappers.

View solution in original post

5 REPLIES 5

avatar

May be you are hitting the following bug.

https://issues.apache.org/jira/browse/HIVE-9977

And, Probably running the compaction manually on the table will get you more number of tez mappers.

avatar
Super Collaborator

The relevant bug is https://issues.apache.org/jira/browse/HIVE-13821. Triggering compaction is the right solution.

avatar
Explorer

Hello,

I ran compaction manually and it's works. I will try to update Hive.

Thx Lot.

avatar
Explorer

Hi,

I verified the code and the two bugs are correctly deployed on our Hortonworks 2.3.4.7 plateform.

So I don't understand why the compaction doesn't run automaticaly under the paritions and why on delta files the mappers are always blocked to one.

Sometimes when I run a compaction manually with ALTER TABLE ..., the compaction is not run and un message is returned : No delta files found to compact in hdfs:// ... /test39/ annee=2014/ mois=10.

In this parition, I have the folders below : delta_0014664_0014664 delta_0014720_0014720

Do you have an idea ?

avatar
Explorer

I resolved my problem. One transaction was still opened and all delta files with a transaction number upper wasn't processed.

The code snippet which manages this case :

/** * Transform a {@link org.apache.hadoop.hive.metastore.api.GetOpenTxnsInfoResponse} to a * {@link org.apache.hadoop.hive.common.ValidTxnList}. This assumes that the caller intends to * compact the files, and thus treats only open transactions as invalid. Additionally any * txnId > highestOpenTxnId is also invalid. This is avoid creating something like * delta_17_120 where txnId 80, for example, is still open. * @param txns txn list from the metastore * @return a valid txn list. */ public static ValidTxnList createValidCompactTxnList(GetOpenTxnsInfoResponse txns) { //todo: this could be more efficient: using select min(txn_id) from TXNS where txn_state=" + // quoteChar(TXN_OPEN) to compute compute HWM... long highWater = txns.getTxn_high_water_mark(); long minOpenTxn = Long.MAX_VALUE; long[] exceptions = new long[txns.getOpen_txnsSize()]; int i = 0; for (TxnInfo txn : txns.getOpen_txns()) { if (txn.getState() == TxnState.OPEN) minOpenTxn = Math.min(minOpenTxn, txn.getId()); exceptions[i++] = txn.getId();//todo: only add Aborted }//remove all exceptions < minOpenTxn highWater = minOpenTxn == Long.MAX_VALUE ? highWater : minOpenTxn - 1; return new ValidCompactorTxnList(exceptions, -1, highWater); }