Support Questions
Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Innovation Accelerator group hub.

Hive ACID table and low performance : select with just one mapper

Explorer

,

When I create the table with this configuration :

<code>set hive.support.concurrency=true;
set hive.enforce.bucketing=true;
set hive.compactor.initiator.on=true;
set hive.compactor.worker.threads=1;

CREATE TABLE IF NOT EXISTS falcon_alimentation.XX(....) CLUSTERED BY(key) into 5 BUCKETS STORED AS ORC TBLPROPERTIES ('transactional'='true');

I load data :

<code>INSERT INTO TABLE falcon_alimentation_socle.XX PARTITION(annee,mois,jour) select * from falcon_alimentation_socle_prod.XXX limit 1000000;

When I try to make a select count(x), Tez generates 7 mappers :

<code>--------------------------------------------------------------------------------
        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------
Map 1 ..........   SUCCEEDED      7          7        0        0       0       0
Reducer 2 ......   SUCCEEDED      1          1        0        0       0       0
--------------------------------------------------------------------------------
VERTICES: 02/02  [==========================>>] 100%  ELAPSED TIME: 5.64 s
-------------------------------------------------------------------------------

But I need an ACID Table, I modidy the previously configuration with :

<code>set hive.support.concurrency=true;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;

When I try to execute select count, why Tez generates just one mapper ?!!

<code>--------------------------------------------------------------------------------
        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------
Map 1 ..........   SUCCEEDED      1          1        0        0       0       0
Reducer 2 ......   SUCCEEDED      1          1        0        0       0       0
--------------------------------------------------------------------------------
VERTICES: 02/02  [==========================>>] 100%  ELAPSED TIME: 7.26 s
--------------------------------------------------------------------------------

Do you have an idea ?

1 ACCEPTED SOLUTION

May be you are hitting the following bug.

https://issues.apache.org/jira/browse/HIVE-9977

And, Probably running the compaction manually on the table will get you more number of tez mappers.

View solution in original post

5 REPLIES 5

May be you are hitting the following bug.

https://issues.apache.org/jira/browse/HIVE-9977

And, Probably running the compaction manually on the table will get you more number of tez mappers.

Expert Contributor

The relevant bug is https://issues.apache.org/jira/browse/HIVE-13821. Triggering compaction is the right solution.

Explorer

Hello,

I ran compaction manually and it's works. I will try to update Hive.

Thx Lot.

Explorer

Hi,

I verified the code and the two bugs are correctly deployed on our Hortonworks 2.3.4.7 plateform.

So I don't understand why the compaction doesn't run automaticaly under the paritions and why on delta files the mappers are always blocked to one.

Sometimes when I run a compaction manually with ALTER TABLE ..., the compaction is not run and un message is returned : No delta files found to compact in hdfs:// ... /test39/ annee=2014/ mois=10.

In this parition, I have the folders below : delta_0014664_0014664 delta_0014720_0014720

Do you have an idea ?

Explorer

I resolved my problem. One transaction was still opened and all delta files with a transaction number upper wasn't processed.

The code snippet which manages this case :

/** * Transform a {@link org.apache.hadoop.hive.metastore.api.GetOpenTxnsInfoResponse} to a * {@link org.apache.hadoop.hive.common.ValidTxnList}. This assumes that the caller intends to * compact the files, and thus treats only open transactions as invalid. Additionally any * txnId > highestOpenTxnId is also invalid. This is avoid creating something like * delta_17_120 where txnId 80, for example, is still open. * @param txns txn list from the metastore * @return a valid txn list. */ public static ValidTxnList createValidCompactTxnList(GetOpenTxnsInfoResponse txns) { //todo: this could be more efficient: using select min(txn_id) from TXNS where txn_state=" + // quoteChar(TXN_OPEN) to compute compute HWM... long highWater = txns.getTxn_high_water_mark(); long minOpenTxn = Long.MAX_VALUE; long[] exceptions = new long[txns.getOpen_txnsSize()]; int i = 0; for (TxnInfo txn : txns.getOpen_txns()) { if (txn.getState() == TxnState.OPEN) minOpenTxn = Math.min(minOpenTxn, txn.getId()); exceptions[i++] = txn.getId();//todo: only add Aborted }//remove all exceptions < minOpenTxn highWater = minOpenTxn == Long.MAX_VALUE ? highWater : minOpenTxn - 1; return new ValidCompactorTxnList(exceptions, -1, highWater); }