Created 07-15-2016 08:50 AM
,
When I create the table with this configuration :
<code>set hive.support.concurrency=true; set hive.enforce.bucketing=true; set hive.compactor.initiator.on=true; set hive.compactor.worker.threads=1; CREATE TABLE IF NOT EXISTS falcon_alimentation.XX(....) CLUSTERED BY(key) into 5 BUCKETS STORED AS ORC TBLPROPERTIES ('transactional'='true');
I load data :
<code>INSERT INTO TABLE falcon_alimentation_socle.XX PARTITION(annee,mois,jour) select * from falcon_alimentation_socle_prod.XXX limit 1000000;
When I try to make a select count(x), Tez generates 7 mappers :
<code>-------------------------------------------------------------------------------- VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED -------------------------------------------------------------------------------- Map 1 .......... SUCCEEDED 7 7 0 0 0 0 Reducer 2 ...... SUCCEEDED 1 1 0 0 0 0 -------------------------------------------------------------------------------- VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 5.64 s -------------------------------------------------------------------------------
But I need an ACID Table, I modidy the previously configuration with :
<code>set hive.support.concurrency=true; set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
When I try to execute select count, why Tez generates just one mapper ?!!
<code>-------------------------------------------------------------------------------- VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED -------------------------------------------------------------------------------- Map 1 .......... SUCCEEDED 1 1 0 0 0 0 Reducer 2 ...... SUCCEEDED 1 1 0 0 0 0 -------------------------------------------------------------------------------- VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 7.26 s --------------------------------------------------------------------------------
Do you have an idea ?
Created 07-15-2016 12:31 PM
May be you are hitting the following bug.
https://issues.apache.org/jira/browse/HIVE-9977
And, Probably running the compaction manually on the table will get you more number of tez mappers.
Created 07-15-2016 12:31 PM
May be you are hitting the following bug.
https://issues.apache.org/jira/browse/HIVE-9977
And, Probably running the compaction manually on the table will get you more number of tez mappers.
Created 07-15-2016 06:40 PM
The relevant bug is https://issues.apache.org/jira/browse/HIVE-13821. Triggering compaction is the right solution.
Created 07-17-2016 04:31 PM
Hello,
I ran compaction manually and it's works. I will try to update Hive.
Thx Lot.
Created 07-22-2016 08:11 AM
Hi,
I verified the code and the two bugs are correctly deployed on our Hortonworks 2.3.4.7 plateform.
So I don't understand why the compaction doesn't run automaticaly under the paritions and why on delta files the mappers are always blocked to one.
Sometimes when I run a compaction manually with ALTER TABLE ..., the compaction is not run and un message is returned : No delta files found to compact in hdfs:// ... /test39/ annee=2014/ mois=10.
In this parition, I have the folders below : delta_0014664_0014664 delta_0014720_0014720
Do you have an idea ?
Created 08-05-2016 01:42 PM
I resolved my problem. One transaction was still opened and all delta files with a transaction number upper wasn't processed.
The code snippet which manages this case :
/** * Transform a {@link org.apache.hadoop.hive.metastore.api.GetOpenTxnsInfoResponse} to a * {@link org.apache.hadoop.hive.common.ValidTxnList}. This assumes that the caller intends to * compact the files, and thus treats only open transactions as invalid. Additionally any * txnId > highestOpenTxnId is also invalid. This is avoid creating something like * delta_17_120 where txnId 80, for example, is still open. * @param txns txn list from the metastore * @return a valid txn list. */ public static ValidTxnList createValidCompactTxnList(GetOpenTxnsInfoResponse txns) { //todo: this could be more efficient: using select min(txn_id) from TXNS where txn_state=" + // quoteChar(TXN_OPEN) to compute compute HWM... long highWater = txns.getTxn_high_water_mark(); long minOpenTxn = Long.MAX_VALUE; long[] exceptions = new long[txns.getOpen_txnsSize()]; int i = 0; for (TxnInfo txn : txns.getOpen_txns()) { if (txn.getState() == TxnState.OPEN) minOpenTxn = Math.min(minOpenTxn, txn.getId()); exceptions[i++] = txn.getId();//todo: only add Aborted }//remove all exceptions < minOpenTxn highWater = minOpenTxn == Long.MAX_VALUE ? highWater : minOpenTxn - 1; return new ValidCompactorTxnList(exceptions, -1, highWater); }