<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Hbase export - split the sequence files with specific size limit in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Hbase-export-split-the-sequence-files-with-specific-size/m-p/341983#M233667</link>
    <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/97237"&gt;@arunr307&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;What is the CDH version？Could you attach the full output of this command, from the command help menu there's no properties about split size：&lt;/P&gt;&lt;P&gt;# hbase org.apache.hadoop.hbase.mapreduce.Export&lt;BR /&gt;ERROR: Wrong number of arguments: 0&lt;BR /&gt;Usage: Export [-D &amp;lt;property=value&amp;gt;]* &amp;lt;tablename&amp;gt; &amp;lt;outputdir&amp;gt; [&amp;lt;versions&amp;gt; [&amp;lt;starttime&amp;gt; [&amp;lt;endtime&amp;gt;]] [^[regex pattern] or [Prefix] to filter]]&lt;/P&gt;&lt;P&gt;Note: -D properties will be applied to the conf used.&lt;BR /&gt;For example:&lt;BR /&gt;-D mapreduce.output.fileoutputformat.compress=true&lt;BR /&gt;-D mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec&lt;BR /&gt;-D mapreduce.output.fileoutputformat.compress.type=BLOCK&lt;BR /&gt;Additionally, the following SCAN properties can be specified&lt;BR /&gt;to control/limit what is exported..&lt;BR /&gt;-D hbase.mapreduce.scan.column.family=&amp;lt;family1&amp;gt;,&amp;lt;family2&amp;gt;, ...&lt;BR /&gt;-D hbase.mapreduce.include.deleted.rows=true&lt;BR /&gt;-D hbase.mapreduce.scan.row.start=&amp;lt;ROWSTART&amp;gt;&lt;BR /&gt;-D hbase.mapreduce.scan.row.stop=&amp;lt;ROWSTOP&amp;gt;&lt;BR /&gt;-D hbase.client.scanner.caching=100&lt;BR /&gt;-D hbase.export.visibility.labels=&amp;lt;labels&amp;gt;&lt;BR /&gt;For tables with very wide rows consider setting the batch size as below:&lt;BR /&gt;-D hbase.export.scanner.batch=10&lt;BR /&gt;-D hbase.export.scanner.caching=100&lt;BR /&gt;-D mapreduce.job.name=jobName - use the specified mapreduce job name for the export&lt;BR /&gt;For MR performance consider the following properties:&lt;BR /&gt;-D mapreduce.map.speculative=false&lt;BR /&gt;-D mapreduce.reduce.speculative=false&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Will&lt;/P&gt;</description>
    <pubDate>Fri, 22 Apr 2022 09:44:13 GMT</pubDate>
    <dc:creator>willx</dc:creator>
    <dc:date>2022-04-22T09:44:13Z</dc:date>
    <item>
      <title>Hbase export - split the sequence files with specific size limit</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hbase-export-split-the-sequence-files-with-specific-size/m-p/341148#M233451</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am trying to export large hbase table into sequence files. Our requirement is to split the sequence files with specific size limit. We have tried multiple options but output part files are getting generated with large size like 6-7 GB. We used the below sample export command which trying to generate sequence file in a range of 100mb - 135mb,&amp;nbsp; but didn't work as expected.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;hbase org.apache.hadoop.hbase.mapreduce.Export -D dfs.blocksize=1048576 -D mapred.min.split.size=102400 -D mapred.max.split.size=1048576 \table_name /output/hbase/&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Could you please help me here to export the table into small chunks ?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 11 Apr 2022 20:24:42 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hbase-export-split-the-sequence-files-with-specific-size/m-p/341148#M233451</guid>
      <dc:creator>arunr307</dc:creator>
      <dc:date>2022-04-11T20:24:42Z</dc:date>
    </item>
    <item>
      <title>Re: Hbase export - split the sequence files with specific size limit</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hbase-export-split-the-sequence-files-with-specific-size/m-p/341983#M233667</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/97237"&gt;@arunr307&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;What is the CDH version？Could you attach the full output of this command, from the command help menu there's no properties about split size：&lt;/P&gt;&lt;P&gt;# hbase org.apache.hadoop.hbase.mapreduce.Export&lt;BR /&gt;ERROR: Wrong number of arguments: 0&lt;BR /&gt;Usage: Export [-D &amp;lt;property=value&amp;gt;]* &amp;lt;tablename&amp;gt; &amp;lt;outputdir&amp;gt; [&amp;lt;versions&amp;gt; [&amp;lt;starttime&amp;gt; [&amp;lt;endtime&amp;gt;]] [^[regex pattern] or [Prefix] to filter]]&lt;/P&gt;&lt;P&gt;Note: -D properties will be applied to the conf used.&lt;BR /&gt;For example:&lt;BR /&gt;-D mapreduce.output.fileoutputformat.compress=true&lt;BR /&gt;-D mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec&lt;BR /&gt;-D mapreduce.output.fileoutputformat.compress.type=BLOCK&lt;BR /&gt;Additionally, the following SCAN properties can be specified&lt;BR /&gt;to control/limit what is exported..&lt;BR /&gt;-D hbase.mapreduce.scan.column.family=&amp;lt;family1&amp;gt;,&amp;lt;family2&amp;gt;, ...&lt;BR /&gt;-D hbase.mapreduce.include.deleted.rows=true&lt;BR /&gt;-D hbase.mapreduce.scan.row.start=&amp;lt;ROWSTART&amp;gt;&lt;BR /&gt;-D hbase.mapreduce.scan.row.stop=&amp;lt;ROWSTOP&amp;gt;&lt;BR /&gt;-D hbase.client.scanner.caching=100&lt;BR /&gt;-D hbase.export.visibility.labels=&amp;lt;labels&amp;gt;&lt;BR /&gt;For tables with very wide rows consider setting the batch size as below:&lt;BR /&gt;-D hbase.export.scanner.batch=10&lt;BR /&gt;-D hbase.export.scanner.caching=100&lt;BR /&gt;-D mapreduce.job.name=jobName - use the specified mapreduce job name for the export&lt;BR /&gt;For MR performance consider the following properties:&lt;BR /&gt;-D mapreduce.map.speculative=false&lt;BR /&gt;-D mapreduce.reduce.speculative=false&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Will&lt;/P&gt;</description>
      <pubDate>Fri, 22 Apr 2022 09:44:13 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hbase-export-split-the-sequence-files-with-specific-size/m-p/341983#M233667</guid>
      <dc:creator>willx</dc:creator>
      <dc:date>2022-04-22T09:44:13Z</dc:date>
    </item>
    <item>
      <title>Re: Hbase export - split the sequence files with specific size limit</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hbase-export-split-the-sequence-files-with-specific-size/m-p/342556#M233758</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/97237"&gt;@arunr307&lt;/a&gt;,&amp;nbsp;Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 28 Apr 2022 08:24:49 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hbase-export-split-the-sequence-files-with-specific-size/m-p/342556#M233758</guid>
      <dc:creator>VidyaSargur</dc:creator>
      <dc:date>2022-04-28T08:24:49Z</dc:date>
    </item>
  </channel>
</rss>

