Add the property to custom hbase-site section in Ambari HBase config
You can follow any of the following steps.
1) You can change the program by adding following line after setting zookeeper client port.
2) you can add this property to custom configs through ambari and restart the cluster so new configs take affect
3) you can add this property directly to /etc/conf/hbase-site.xml in the machine where you are running the job so that you need not change the program everytime.
<property> <name>hbase.hregion.max.filesize</name> <value>10737418240</value> </property>
Hi @venkateswarlu prudhvi,
Loading too many hfiles into a single ColumnFamily(CF) might:
1. put too much pressure on compactions (disk IO, network IO could be bottleneck).
2. read(get/scan) performance could also be affected.
3. write operations could be blocked too.
Increase ‘hbase.mapreduce.bulkload.max.hfiles.perRegion.perFamily’ and ‘hbase.hstore.blockingStoreFiles’ would let your pass the EXCEPTION but they only mitigate the 3rd problem above.
Usually, the main reason for this issue is that ‘hbase.hregion.max.filesize’ is too small for your data. Ideally, we should generate one HFile for each CF before bulk loading. To do this, you could add the following line in your program after setting zookeeper hosts/client port:
One more thing, you should pre-split your HTable into 100 (or maybe 1000) regions evenly according to your data size in advance and set ‘hbase.hregion.max.filesize’ to ‘9223372036854775807’ in the table description by hbase shell, which means the table will not split automatically. To split region manually, you could write a small script(e.g. split.rb):
split ‘<table_name>’, ‘002’ split ‘<table_name>’, ‘004’ split ‘<table_name>’, ‘006’ split ‘<table_name>’, ‘008’
and execute it by:
$ echo split.rb | hbase shell
I tried to use dobulkload and below is my code.. I get the error As 17/09/26 20:39:48 WARN mapreduce.LoadIncrementalHFiles: Bulk load operation did not find any files to load in directory /tmp/hfiles1. Does it contain files in subdirectories that correspond to column family names? 17/09/26 20:39:48 INFO client.ConnectionManager$HConnectionImplementation: Closing master protocol: MasterService 17/09/26 20:39:48 INFO client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x55e631c17bb08ba 17/09/26 20:39:48 INFO zookeeper.ZooKeeper: Session: 0x55e631c17bb08ba closed 17/09/26 20:39:48 INFO zookeeper.ClientCnxn: EventThread shut down --------------------------------------- HTable table = new HTable(conf, tableName); LoadIncrementalHFiles loader = new LoadIncrementalHFiles( conf); loader.doBulkLoad(new Path("/tmp/hfiles1"), table); System.out.print("HFile sent"); ResultScanner scanner = table.getScanner ( "data".getBytes ( ) ); Result next = null;