<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Status of Grouping Puts by RegionServer in HBase 1+? in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Status-of-Grouping-Puts-by-RegionServer-in-HBase-1/m-p/95885#M9272</link>
    <description>&lt;P&gt;that's not what I was asking but thanks.&lt;/P&gt;</description>
    <pubDate>Thu, 19 Nov 2015 03:57:18 GMT</pubDate>
    <dc:creator>aervits</dc:creator>
    <dc:date>2015-11-19T03:57:18Z</dc:date>
    <item>
      <title>Status of Grouping Puts by RegionServer in HBase 1+?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Status-of-Grouping-Puts-by-RegionServer-in-HBase-1/m-p/95883#M9270</link>
      <description>&lt;P&gt;&lt;STRONG&gt;Does anyone know if this utility was renamed or deprecated? Is there an equivalent? &lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;/STRONG&gt;&lt;STRONG&gt;HBase Client: Group Puts by RegionServer&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;In addition to using the writeBuffer, grouping Puts by RegionServer can reduce the number of client RPC calls per writeBuffer flush. There is a utility HTableUtil currently on TRUNK that does this, but you can either copy that or implement your own verison for those still on 0.90.x or earlier.&lt;/P&gt;</description>
      <pubDate>Fri, 23 Oct 2015 10:16:55 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Status-of-Grouping-Puts-by-RegionServer-in-HBase-1/m-p/95883#M9270</guid>
      <dc:creator>aervits</dc:creator>
      <dc:date>2015-10-23T10:16:55Z</dc:date>
    </item>
    <item>
      <title>Re: Status of Grouping Puts by RegionServer in HBase 1+?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Status-of-Grouping-Puts-by-RegionServer-in-HBase-1/m-p/95884#M9271</link>
      <description>&lt;P&gt;The put-method of Hbase's Table-class supports single and multiple put elements. So you can either do &lt;EM&gt;mytable.put(new Put(...))&lt;/EM&gt; or &lt;EM&gt;mytable.put(List&amp;lt;Put&amp;gt;)&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;For example:&lt;/P&gt;&lt;PRE&gt;String myFamily = 'f1';
String columnA = 'c1';
String valPrefix = 'blub';
String numRows = 500000;
String batchSize = 1000;
List&amp;lt;Put&amp;gt; puts = new ArrayList&amp;lt;Put&amp;gt;();
for(int row = 0; row &amp;lt; numRows; row++) {
	String value = valPrefix + Integer.toString(row);

	// create put
	Put put = new Put(rowKeys[batch]);
	put.add(Bytes.toBytes(myFamily), Bytes.toBytes(columnA), Bytes.toBytes(value));

	// add to batch
	puts.add(p);
	if(puts.size() % batchSize == 0){
		try {
			myTable.put(puts);
			myTable.flushCommits();
		} catch (Exception e) {
			e.printStackTrace();
		}
		puts.clear();
	}
}
&lt;/PRE&gt;&lt;P&gt;You can also use the &lt;EM&gt;batch-method&lt;/EM&gt;. The only difference between batch and put-batch is that the batch-method accepts other actions as well, for example Gets.&lt;/P&gt;&lt;P&gt;&lt;A target="_blank" href="https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html"&gt;https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html&lt;/A&gt;&lt;/P&gt;&lt;PRE&gt;void put(List&amp;lt;Put&amp;gt; puts) throws IOException &lt;/PRE&gt;&lt;P&gt;&lt;EM&gt;Puts some data in the table, in batch.
This can be used for group commit, or for submitting user defined batches. The writeBuffer will be periodically inspected while the List is processed, so depending on the List size the writeBuffer may flush not at all, or more than once. &lt;/EM&gt;&lt;/P&gt;&lt;PRE&gt;void batch(List&amp;lt;? extends Row&amp;gt; actions, Object[] results) throws IOException, InterruptedException &lt;/PRE&gt;&lt;P&gt;&lt;EM&gt;Method that does a batch call on Deletes, Gets, Puts, Increments and Appends. The ordering of execution of the actions is not defined. Meaning if you do a Put and a Get in the same batch(java.util.List&amp;lt;? extends org.apache.hadoop.hbase.client.Row&amp;gt;, java.lang.Object[]) call, you will not necessarily be guaranteed that the Get returns what the Put had put.&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;Make sure you check out the section about &lt;A target="_blank" href="http://hbase.apache.org/book.html#perf.writing"&gt;"Writing to HBase"&lt;/A&gt; in the HBase book. It has some interesting information about batch writing/performance, e.g. turning off WAL (Write Ahead Log).&lt;/P&gt;&lt;P&gt;In regards to the number of RPCCalls, have you considered the &lt;A href="http://hbase.apache.org/book.html#arch.bulk.load"&gt;bulkloading capabilities&lt;/A&gt; of HBase (like saving files in HDFS and afterwards using bulk import to get the data into HBase)?&lt;/P&gt;</description>
      <pubDate>Thu, 19 Nov 2015 03:13:40 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Status-of-Grouping-Puts-by-RegionServer-in-HBase-1/m-p/95884#M9271</guid>
      <dc:creator>jstraub</dc:creator>
      <dc:date>2015-11-19T03:13:40Z</dc:date>
    </item>
    <item>
      <title>Re: Status of Grouping Puts by RegionServer in HBase 1+?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Status-of-Grouping-Puts-by-RegionServer-in-HBase-1/m-p/95885#M9272</link>
      <description>&lt;P&gt;that's not what I was asking but thanks.&lt;/P&gt;</description>
      <pubDate>Thu, 19 Nov 2015 03:57:18 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Status-of-Grouping-Puts-by-RegionServer-in-HBase-1/m-p/95885#M9272</guid>
      <dc:creator>aervits</dc:creator>
      <dc:date>2015-11-19T03:57:18Z</dc:date>
    </item>
    <item>
      <title>Re: Status of Grouping Puts by RegionServer in HBase 1+?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Status-of-Grouping-Puts-by-RegionServer-in-HBase-1/m-p/95886#M9273</link>
      <description>&lt;P&gt;The answer is the logic to group puts by regionserver is now built-in with HBase API 1.0+. It is no longer necessary to leverage any other code to achieve it.&lt;/P&gt;</description>
      <pubDate>Tue, 02 Feb 2016 23:50:46 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Status-of-Grouping-Puts-by-RegionServer-in-HBase-1/m-p/95886#M9273</guid>
      <dc:creator>aervits</dc:creator>
      <dc:date>2016-02-02T23:50:46Z</dc:date>
    </item>
  </channel>
</rss>

