Created 10-23-2015 03:16 AM
Does anyone know if this utility was renamed or deprecated? Is there an equivalent?
HBase Client: Group Puts by RegionServer
In addition to using the writeBuffer, grouping Puts by RegionServer can reduce the number of client RPC calls per writeBuffer flush. There is a utility HTableUtil currently on TRUNK that does this, but you can either copy that or implement your own verison for those still on 0.90.x or earlier.
Created 02-02-2016 03:50 PM
The answer is the logic to group puts by regionserver is now built-in with HBase API 1.0+. It is no longer necessary to leverage any other code to achieve it.
Created 11-18-2015 07:13 PM
The put-method of Hbase's Table-class supports single and multiple put elements. So you can either do mytable.put(new Put(...)) or mytable.put(List<Put>)
For example:
String myFamily = 'f1'; String columnA = 'c1'; String valPrefix = 'blub'; String numRows = 500000; String batchSize = 1000; List<Put> puts = new ArrayList<Put>(); for(int row = 0; row < numRows; row++) { String value = valPrefix + Integer.toString(row); // create put Put put = new Put(rowKeys[batch]); put.add(Bytes.toBytes(myFamily), Bytes.toBytes(columnA), Bytes.toBytes(value)); // add to batch puts.add(p); if(puts.size() % batchSize == 0){ try { myTable.put(puts); myTable.flushCommits(); } catch (Exception e) { e.printStackTrace(); } puts.clear(); } }
You can also use the batch-method. The only difference between batch and put-batch is that the batch-method accepts other actions as well, for example Gets.
https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html
void put(List<Put> puts) throws IOException
Puts some data in the table, in batch. This can be used for group commit, or for submitting user defined batches. The writeBuffer will be periodically inspected while the List is processed, so depending on the List size the writeBuffer may flush not at all, or more than once.
void batch(List<? extends Row> actions, Object[] results) throws IOException, InterruptedException
Method that does a batch call on Deletes, Gets, Puts, Increments and Appends. The ordering of execution of the actions is not defined. Meaning if you do a Put and a Get in the same batch(java.util.List<? extends org.apache.hadoop.hbase.client.Row>, java.lang.Object[]) call, you will not necessarily be guaranteed that the Get returns what the Put had put.
Make sure you check out the section about "Writing to HBase" in the HBase book. It has some interesting information about batch writing/performance, e.g. turning off WAL (Write Ahead Log).
In regards to the number of RPCCalls, have you considered the bulkloading capabilities of HBase (like saving files in HDFS and afterwards using bulk import to get the data into HBase)?
Created 11-18-2015 07:57 PM
that's not what I was asking but thanks.
Created 02-02-2016 03:50 PM
The answer is the logic to group puts by regionserver is now built-in with HBase API 1.0+. It is no longer necessary to leverage any other code to achieve it.