Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

I am getting java.io.IOException: Trying to load more than 32 hfiles to one family of one region Error when I am doing Bulk Load

Highlighted

Re: I am getting java.io.IOException: Trying to load more than 32 hfiles to one family of one region Error when I am doing Bulk Load

Mentor

Add the property to custom hbase-site section in Ambari HBase config

Highlighted

Re: I am getting java.io.IOException: Trying to load more than 32 hfiles to one family of one region Error when I am doing Bulk Load

You can follow any of the following steps.

1) You can change the program by adding following line after setting zookeeper client port.

configuration.setInt(LoadIncrementalHFiles.MAX_FILES_PER_REGION_PER_FAMILY,64);

2) you can add this property to custom configs through ambari and restart the cluster so new configs take affect

3) you can add this property directly to /etc/conf/hbase-site.xml in the machine where you are running the job so that you need not change the program everytime.

<property>

<name>hbase.mapreduce.bulkload.max.hfiles.perRegion.perFamily</name>

<value>64</value>

</property>

Highlighted

Re: I am getting java.io.IOException: Trying to load more than 32 hfiles to one family of one region Error when I am doing Bulk Load

<property> <name>hbase.hregion.max.filesize</name> <value>10737418240</value> </property>

Highlighted

Re: I am getting java.io.IOException: Trying to load more than 32 hfiles to one family of one region Error when I am doing Bulk Load

Contributor

Hi @venkateswarlu prudhvi,

Loading too many hfiles into a single ColumnFamily(CF) might:

1. put too much pressure on compactions (disk IO, network IO could be bottleneck).

2. read(get/scan) performance could also be affected.

3. write operations could be blocked too.

Increase ‘hbase.mapreduce.bulkload.max.hfiles.perRegion.perFamily’ and ‘hbase.hstore.blockingStoreFiles’ would let your pass the EXCEPTION but they only mitigate the 3rd problem above.

Usually, the main reason for this issue is that ‘hbase.hregion.max.filesize’ is too small for your data. Ideally, we should generate one HFile for each CF before bulk loading. To do this, you could add the following line in your program after setting zookeeper hosts/client port:

configuration.setLong(‘hbase.hregion.max.filesize’, 9223372036854775807);

One more thing, you should pre-split your HTable into 100 (or maybe 1000) regions evenly according to your data size in advance and set ‘hbase.hregion.max.filesize’ to ‘9223372036854775807’ in the table description by hbase shell, which means the table will not split automatically. To split region manually, you could write a small script(e.g. split.rb):

split ‘<table_name>’, ‘002’
split ‘<table_name>’, ‘004’
split ‘<table_name>’, ‘006’
split ‘<table_name>’, ‘008’

and execute it by:

$ echo split.rb | hbase shell
Highlighted

Re: I am getting java.io.IOException: Trying to load more than 32 hfiles to one family of one region Error when I am doing Bulk Load

New Contributor

@Victor Xu

How can we pre-split our HTable into 100 (or maybe 1000) using scala spark. Can you please assist?

Highlighted

Re: I am getting java.io.IOException: Trying to load more than 32 hfiles to one family of one region Error when I am doing Bulk Load

Explorer
I tried to use dobulkload and below is my code.. I get the error As
17/09/26 20:39:48 WARN mapreduce.LoadIncrementalHFiles: Bulk load operation did not find any files to load in directory /tmp/hfiles1.  Does it contain files in subdirectories that correspond to column family names?
17/09/26 20:39:48 INFO client.ConnectionManager$HConnectionImplementation: Closing master protocol: MasterService
17/09/26 20:39:48 INFO client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x55e631c17bb08ba
17/09/26 20:39:48 INFO zookeeper.ZooKeeper: Session: 0x55e631c17bb08ba closed
17/09/26 20:39:48 INFO zookeeper.ClientCnxn: EventThread shut down



---------------------------------------

HTable table = new HTable(conf, tableName);
LoadIncrementalHFiles loader = new LoadIncrementalHFiles(
        conf);
loader.doBulkLoad(new Path("/tmp/hfiles1"), table);
System.out.print("HFile sent");
ResultScanner scanner = table.getScanner ( "data".getBytes ( ) );
Result next = null;
Don't have an account?
Coming from Hortonworks? Activate your account here