Community Articles

Find and share helpful community-sourced technical articles.
Labels (1)
avatar
Guru

Here are the best practices for writing an HBase client application for HDP.

1. Use the new HBase-1.0 API's instead of old interfaces. Instead of HTable, use Table, instead of HConnection, use Connection, etc. Also the Connection management has been changed so that the connection lifecycle management is best performed by the client application.

Check out these slide decks for more examples:

https://www.dropbox.com/s/v1x3djtlp1qg204/HBase%201.0%20API%20Changes%20-%20Meetup%2010_15_2014.pdf?...

http://www.slideshare.net/enissoz/meet-hbase-10

As well as further examples here:

https://github.com/hortonworks/hbase-release/tree/HDP-2.5.3.0-tag/hbase-examples

2. Always close the Connection, Table, Scanner and Admin interfaces when you are done. These interfaces implement Closeable, and holds resources on the client side or on the server side (Scanner). Thus properly closing these is very important for best performance. Java's new try syntax is an easy way to auto-close these resources in your code:

try (Connection connection = ConnectionFactory.createConnection(conf);        
     Admin admin = connection.getAdmin();
     Table table = connection.getTable(tableName);) {
  table.get(new Get(...))
}

3. Make sure to understand creation-cost, lifecycle and thread-safety of Connection, Table and similar interfaces. In short, Connection is thread safe, and very heavy-weight (owns the underlying zookeeper connection, socket connections, etc), thus it should be created once per application and shared across threads. Table, Admin, etc on the other hand are light weight and NOT thread-safe. Check the above links for more documentation for these interfaces. Typically, you would open a Connection and only close that Connection when the application shuts down. Table and Admin objects can be created and closed per request.

4. Use BufferedMutator for streaming / batch Puts. BufferedMutator replaces HTable.setAutoFlush(false) and is the supported high-performance streaming writes API.

5. Make sure that hbase-site.xml is sourced instead of manually calling conf.set() for hbase.zookeeper.quorum, etc. When

Configuration conf = HBaseConfiguration.create();

is called HBase looks for the file named "hbase-site.xml" in all of the DIRECTORIES in the classpath. Thus, if the application adds /etc/hbase/conf (which is the default location for HDP) to its classpath at the start, there is no need to manually call conf.set() for client settings. Applications should especially do this, since other client-level configuration settings coming from the Ambari deployment automatically gets picked up by the client application without code change.

6. Make sure to depend on the correct version of client jars. Applications should always depend on the HDP version of the artifacts coming from the HDP repo instead of the Apache version of the artifacts. HDP versions of components are usually mostly binary and wire compatible with the base versions of Apache components. However, there maybe fixes to the client jars that the application would otherwise will not see and hence result in hard to debug cases. See

http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.8.0/bk_user-guide/content/user-guide-setup-mav...

for an example.

Enis

28,444 Views
Comments
avatar
New Contributor

Do you mean the New Hbase 2.0 API's and not the Hbase 1.0 API's?

avatar
Master Mentor

@Mark Smith this is in reference to the current API which is 1.x.

avatar
New Contributor

I have some misunderstanding. Regarding to item 2 we have to close Connection always for the best performance. But regarding to item 3 we should create single Connection per Application.

So, what is the best practice, holding single connection or open/close it for each user request?

avatar
Guru

Thanks. Fixed the link.

avatar
Guru

You should close the Connection / Admin / Table WHEN you are done. According to #3, Connection creation is very costly, so you would want to share the connection as much as possible, which means that you won't be "done" with the Connection until the application shuts down. Close the Connection when you know that you won't be doing any more requests. However, Table and Admin is relatively cheap, so you should open / close those per request.

avatar
New Contributor

I know this is a bit late to post but i have a web app that scans the table and gets results based on the rowkey provided in the call so it needs to support multi threading, here's a snip of the scan:

 

 

try(ResultScanner scanner = myTable.getScanner(scan)) {
	for (Result result : scanner) {
		//logic of result.getValue() and result.getRow()
	}
}

 

 

i just saw https://hbase.apache.org/1.2/devapidocs/org/apache/hadoop/hbase/client/Result.html is one of those classes that is not thread-safe among others mentioned in this article. Is there an example of a fully thread-safe hbase app that scans results based on the rowkey provided or anything similar? I'm looking for an efficient and good example i can use for reference. I am now concerned that this piece of code might not yield proper results when i get simultaneous requests.