<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Pre-splitting Hbase table not working in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Pre-splitting-Hbase-table-not-working/m-p/132502#M34895</link>
    <description>&lt;P&gt;First of all, do you see 20 regions in Web UI? If , yes check data distribution per region (for every region you can get total store size) You are probably hitting single region because all your data keys are skewed. If you do not know key distribution it does not make sense to presplit table - leave it to HBase.&lt;/P&gt;</description>
    <pubDate>Mon, 18 Jul 2016 03:59:33 GMT</pubDate>
    <dc:creator>vrodionov</dc:creator>
    <dc:date>2016-07-18T03:59:33Z</dc:date>
    <item>
      <title>Pre-splitting Hbase table not working</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Pre-splitting-Hbase-table-not-working/m-p/132501#M34894</link>
      <description>&lt;P&gt;I have created Hbase table using below commands and splitted table into 20 &lt;/P&gt;&lt;P&gt;hbase org.apache.hadoop.hbase.util.RegionSplitter test_rec_a UniformSplit -c 20 -f rec &lt;/P&gt;&lt;P&gt;alter 'test_rec_a', {METHOD =&amp;gt; 'table_att', CONFIGURATION =&amp;gt; {'SPLIT_POLICY' =&amp;gt; 'org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy'}},
{NAME=&amp;gt;'rec', DATA_BLOCK_ENCODING =&amp;gt; 'NONE', BLOOMFILTER =&amp;gt; 'ROWCOL', REPLICATION_SCOPE =&amp;gt; '0', VERSIONS =&amp;gt; '3', COMPRESSION =&amp;gt; 'NONE', TTL =&amp;gt;'5184000', BLOCKSIZE =&amp;gt; '65536', IN_MEMORY =&amp;gt; 'true', BLOCKCACHE =&amp;gt; 'true',MIN_VERSIONS =&amp;gt; '0',KEEP_DELETED_CELLS =&amp;gt; 'false'} &lt;/P&gt;&lt;P&gt;enable 'test_rec_a' &lt;/P&gt;&lt;P&gt;hbase config: max region size: 10GB &lt;/P&gt;&lt;P&gt;
I ran 3 bulk load jobs with 2, 9, 80 GB file ... files has all unique keys &lt;/P&gt;&lt;P&gt;I was expecting that job run and load data in all 20 regions but loaded data in single region only
is there something i am missing here??&lt;/P&gt;&lt;P&gt;
i am looking to pre-split table into 20 regions but i don't know keys distribution as keys are hashed. &lt;/P&gt;&lt;P&gt;is there a way to pre-split without knowing key distribution or not to pre-split is the right option?? &lt;/P&gt;&lt;P&gt;thanks&lt;/P&gt;</description>
      <pubDate>Mon, 18 Jul 2016 03:28:06 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Pre-splitting-Hbase-table-not-working/m-p/132501#M34894</guid>
      <dc:creator>sunny11541</dc:creator>
      <dc:date>2016-07-18T03:28:06Z</dc:date>
    </item>
    <item>
      <title>Re: Pre-splitting Hbase table not working</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Pre-splitting-Hbase-table-not-working/m-p/132502#M34895</link>
      <description>&lt;P&gt;First of all, do you see 20 regions in Web UI? If , yes check data distribution per region (for every region you can get total store size) You are probably hitting single region because all your data keys are skewed. If you do not know key distribution it does not make sense to presplit table - leave it to HBase.&lt;/P&gt;</description>
      <pubDate>Mon, 18 Jul 2016 03:59:33 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Pre-splitting-Hbase-table-not-working/m-p/132502#M34895</guid>
      <dc:creator>vrodionov</dc:creator>
      <dc:date>2016-07-18T03:59:33Z</dc:date>
    </item>
    <item>
      <title>Re: Pre-splitting Hbase table not working</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Pre-splitting-Hbase-table-not-working/m-p/132503#M34896</link>
      <description>&lt;P&gt;I would also say that you should be able to understand the data which you are loading to make sure that you are creating reasonable split points. Even if the keys are hashed, you should be able to understand what the first byte/character of the rowKey is and create reasonable split points (using RegionSplitter or by hand).&lt;/P&gt;</description>
      <pubDate>Mon, 18 Jul 2016 04:31:28 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Pre-splitting-Hbase-table-not-working/m-p/132503#M34896</guid>
      <dc:creator>elserj</dc:creator>
      <dc:date>2016-07-18T04:31:28Z</dc:date>
    </item>
  </channel>
</rss>

