<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Hive ACID table and low performance : select with just one mapper in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-ACID-table-and-low-performance-select-with-just-one/m-p/131012#M34805</link>
    <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I verified the code and the two bugs are correctly deployed on our Hortonworks 2.3.4.7 plateform. &lt;/P&gt;&lt;P&gt;So I don't understand why the compaction doesn't run automaticaly under the paritions and why on delta files the mappers are always blocked to one. &lt;/P&gt;&lt;P&gt;Sometimes when I run a compaction manually with ALTER TABLE ..., the compaction is not run and un message is returned  : No delta files found to compact in hdfs:// ... /test39/ annee=2014/ mois=10. &lt;/P&gt;&lt;P&gt;In this parition, I have the folders below : 
delta_0014664_0014664 delta_0014720_0014720 &lt;/P&gt;&lt;P&gt;Do you have an idea ?&lt;/P&gt;</description>
    <pubDate>Fri, 22 Jul 2016 15:11:08 GMT</pubDate>
    <dc:creator>ederf34</dc:creator>
    <dc:date>2016-07-22T15:11:08Z</dc:date>
    <item>
      <title>Hive ACID table and low performance : select with just one mapper</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-ACID-table-and-low-performance-select-with-just-one/m-p/131008#M34801</link>
      <description>&lt;P&gt;,&lt;/P&gt;&lt;P&gt;When I create the table with this configuration :&lt;/P&gt;&lt;PRE&gt;&amp;lt;code&amp;gt;set hive.support.concurrency=true;
set hive.enforce.bucketing=true;
set hive.compactor.initiator.on=true;
set hive.compactor.worker.threads=1;

CREATE TABLE IF NOT EXISTS falcon_alimentation.XX(....) CLUSTERED BY(key) into 5 BUCKETS STORED AS ORC TBLPROPERTIES ('transactional'='true');
&lt;/PRE&gt;&lt;P&gt;I load data :&lt;/P&gt;&lt;PRE&gt;&amp;lt;code&amp;gt;INSERT INTO TABLE falcon_alimentation_socle.XX PARTITION(annee,mois,jour) select * from falcon_alimentation_socle_prod.XXX limit 1000000;
&lt;/PRE&gt;&lt;P&gt;When I try to make a select count(x), Tez generates 7 mappers :&lt;/P&gt;&lt;PRE&gt;&amp;lt;code&amp;gt;--------------------------------------------------------------------------------
        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------
Map 1 ..........   SUCCEEDED      7          7        0        0       0       0
Reducer 2 ......   SUCCEEDED      1          1        0        0       0       0
--------------------------------------------------------------------------------
VERTICES: 02/02  [==========================&amp;gt;&amp;gt;] 100%  ELAPSED TIME: 5.64 s
-------------------------------------------------------------------------------
&lt;/PRE&gt;&lt;P&gt;&lt;STRONG&gt;But&lt;/STRONG&gt; I need an ACID Table, I modidy the previously configuration with :&lt;/P&gt;&lt;PRE&gt;&amp;lt;code&amp;gt;set hive.support.concurrency=true;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
&lt;/PRE&gt;&lt;P&gt;When I try to execute select count, why Tez generates just one mapper ?!!&lt;/P&gt;&lt;PRE&gt;&amp;lt;code&amp;gt;--------------------------------------------------------------------------------
        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------
Map 1 ..........   SUCCEEDED      1          1        0        0       0       0
Reducer 2 ......   SUCCEEDED      1          1        0        0       0       0
--------------------------------------------------------------------------------
VERTICES: 02/02  [==========================&amp;gt;&amp;gt;] 100%  ELAPSED TIME: 7.26 s
--------------------------------------------------------------------------------
&lt;/PRE&gt;&lt;P&gt;Do you have an idea ?&lt;/P&gt;</description>
      <pubDate>Fri, 15 Jul 2016 15:50:53 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-ACID-table-and-low-performance-select-with-just-one/m-p/131008#M34801</guid>
      <dc:creator>ederf34</dc:creator>
      <dc:date>2016-07-15T15:50:53Z</dc:date>
    </item>
    <item>
      <title>Re: Hive ACID table and low performance : select with just one mapper</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-ACID-table-and-low-performance-select-with-just-one/m-p/131009#M34802</link>
      <description>&lt;P&gt;May be you are hitting the following bug.&lt;/P&gt;&lt;P&gt;&lt;A href="https://issues.apache.org/jira/browse/HIVE-9977" target="_blank"&gt;https://issues.apache.org/jira/browse/HIVE-9977&lt;/A&gt;&lt;/P&gt;&lt;P&gt;And, Probably running the compaction manually on the table will get you more number of tez mappers.&lt;/P&gt;</description>
      <pubDate>Fri, 15 Jul 2016 19:31:12 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-ACID-table-and-low-performance-select-with-just-one/m-p/131009#M34802</guid>
      <dc:creator>asinghal</dc:creator>
      <dc:date>2016-07-15T19:31:12Z</dc:date>
    </item>
    <item>
      <title>Re: Hive ACID table and low performance : select with just one mapper</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-ACID-table-and-low-performance-select-with-just-one/m-p/131010#M34803</link>
      <description>&lt;P&gt;The relevant bug is &lt;A href="https://issues.apache.org/jira/browse/HIVE-13821" target="_blank"&gt;https://issues.apache.org/jira/browse/HIVE-13821&lt;/A&gt;. Triggering compaction is the right solution.&lt;/P&gt;</description>
      <pubDate>Sat, 16 Jul 2016 01:40:28 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-ACID-table-and-low-performance-select-with-just-one/m-p/131010#M34803</guid>
      <dc:creator>ekoifman</dc:creator>
      <dc:date>2016-07-16T01:40:28Z</dc:date>
    </item>
    <item>
      <title>Re: Hive ACID table and low performance : select with just one mapper</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-ACID-table-and-low-performance-select-with-just-one/m-p/131011#M34804</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;I ran compaction manually and it's works. I will try to update Hive.&lt;/P&gt;&lt;P&gt;Thx Lot.&lt;/P&gt;</description>
      <pubDate>Sun, 17 Jul 2016 23:31:14 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-ACID-table-and-low-performance-select-with-just-one/m-p/131011#M34804</guid>
      <dc:creator>ederf34</dc:creator>
      <dc:date>2016-07-17T23:31:14Z</dc:date>
    </item>
    <item>
      <title>Re: Hive ACID table and low performance : select with just one mapper</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-ACID-table-and-low-performance-select-with-just-one/m-p/131012#M34805</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I verified the code and the two bugs are correctly deployed on our Hortonworks 2.3.4.7 plateform. &lt;/P&gt;&lt;P&gt;So I don't understand why the compaction doesn't run automaticaly under the paritions and why on delta files the mappers are always blocked to one. &lt;/P&gt;&lt;P&gt;Sometimes when I run a compaction manually with ALTER TABLE ..., the compaction is not run and un message is returned  : No delta files found to compact in hdfs:// ... /test39/ annee=2014/ mois=10. &lt;/P&gt;&lt;P&gt;In this parition, I have the folders below : 
delta_0014664_0014664 delta_0014720_0014720 &lt;/P&gt;&lt;P&gt;Do you have an idea ?&lt;/P&gt;</description>
      <pubDate>Fri, 22 Jul 2016 15:11:08 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-ACID-table-and-low-performance-select-with-just-one/m-p/131012#M34805</guid>
      <dc:creator>ederf34</dc:creator>
      <dc:date>2016-07-22T15:11:08Z</dc:date>
    </item>
    <item>
      <title>Re: Hive ACID table and low performance : select with just one mapper</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-ACID-table-and-low-performance-select-with-just-one/m-p/131013#M34806</link>
      <description>&lt;P&gt;I resolved my problem. One transaction was still opened and all delta files with a transaction number upper wasn't processed.&lt;/P&gt;&lt;P&gt;The code snippet which manages this case :&lt;/P&gt;&lt;P&gt;  /**
   * Transform a {@link org.apache.hadoop.hive.metastore.api.GetOpenTxnsInfoResponse} to a
   * {@link org.apache.hadoop.hive.common.ValidTxnList}.  This assumes that the caller intends to
   * compact the files, and thus treats only open transactions as invalid.  Additionally any
   * txnId &amp;gt; highestOpenTxnId is also invalid.  This is avoid creating something like
   * delta_17_120 where txnId 80, for example, is still open.
   * @param txns txn list from the metastore
   * @return a valid txn list.
   */
  public static ValidTxnList createValidCompactTxnList(GetOpenTxnsInfoResponse txns) {
    //todo: this could be more efficient: using select min(txn_id) from TXNS where txn_state=" +
    // quoteChar(TXN_OPEN)  to compute compute HWM...
    long highWater = txns.getTxn_high_water_mark();
    long minOpenTxn = Long.MAX_VALUE;
    long[] exceptions = new long[txns.getOpen_txnsSize()];
    int i = 0;
    for (TxnInfo txn : txns.getOpen_txns()) {
      if (txn.getState() == TxnState.OPEN) minOpenTxn = Math.min(minOpenTxn, txn.getId());
      exceptions[i++] = txn.getId();//todo: only add Aborted
    }//remove all exceptions &amp;lt; minOpenTxn
    highWater = minOpenTxn == Long.MAX_VALUE ? highWater : minOpenTxn - 1;
    return new ValidCompactorTxnList(exceptions, -1, highWater);
  }&lt;/P&gt;</description>
      <pubDate>Fri, 05 Aug 2016 20:42:42 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-ACID-table-and-low-performance-select-with-just-one/m-p/131013#M34806</guid>
      <dc:creator>ederf34</dc:creator>
      <dc:date>2016-08-05T20:42:42Z</dc:date>
    </item>
  </channel>
</rss>

