Community Articles
Find and share helpful community-sourced technical articles
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.
Labels (1)
Guru

Here are the best practices for writing an HBase client application for HDP.

1. Use the new HBase-1.0 API's instead of old interfaces. Instead of HTable, use Table, instead of HConnection, use Connection, etc. Also the Connection management has been changed so that the connection lifecycle management is best performed by the client application.

Check out these slide decks for more examples:

https://www.dropbox.com/s/v1x3djtlp1qg204/HBase%201.0%20API%20Changes%20-%20Meetup%2010_15_2014.pdf?...

http://www.slideshare.net/enissoz/meet-hbase-10

As well as further examples here:

https://github.com/hortonworks/hbase-release/tree/HDP-2.5.3.0-tag/hbase-examples

2. Always close the Connection, Table, Scanner and Admin interfaces when you are done. These interfaces implement Closeable, and holds resources on the client side or on the server side (Scanner). Thus properly closing these is very important for best performance. Java's new try syntax is an easy way to auto-close these resources in your code:

try (Connection connection = ConnectionFactory.createConnection(conf);        
     Admin admin = connection.getAdmin();
     Table table = connection.getTable(tableName);) {
  table.get(new Get(...))
}

3. Make sure to understand creation-cost, lifecycle and thread-safety of Connection, Table and similar interfaces. In short, Connection is thread safe, and very heavy-weight (owns the underlying zookeeper connection, socket connections, etc), thus it should be created once per application and shared across threads. Table, Admin, etc on the other hand are light weight and NOT thread-safe. Check the above links for more documentation for these interfaces. Typically, you would open a Connection and only close that Connection when the application shuts down. Table and Admin objects can be created and closed per request.

4. Use BufferedMutator for streaming / batch Puts. BufferedMutator replaces HTable.setAutoFlush(false) and is the supported high-performance streaming writes API.

5. Make sure that hbase-site.xml is sourced instead of manually calling conf.set() for hbase.zookeeper.quorum, etc. When

Configuration conf = HBaseConfiguration.create();

is called HBase looks for the file named "hbase-site.xml" in all of the DIRECTORIES in the classpath. Thus, if the application adds /etc/hbase/conf (which is the default location for HDP) to its classpath at the start, there is no need to manually call conf.set() for client settings. Applications should especially do this, since other client-level configuration settings coming from the Ambari deployment automatically gets picked up by the client application without code change.

6. Make sure to depend on the correct version of client jars. Applications should always depend on the HDP version of the artifacts coming from the HDP repo instead of the Apache version of the artifacts. HDP versions of components are usually mostly binary and wire compatible with the base versions of Apache components. However, there maybe fixes to the client jars that the application would otherwise will not see and hence result in hard to debug cases. See

http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.8.0/bk_user-guide/content/user-guide-setup-mav...

for an example.

Enis

19,853 Views
Comments
Not applicable

Do you mean the New Hbase 2.0 API's and not the Hbase 1.0 API's?

Mentor

@Mark Smith this is in reference to the current API which is 1.x.

Guru
Not applicable

I have some misunderstanding. Regarding to item 2 we have to close Connection always for the best performance. But regarding to item 3 we should create single Connection per Application.

So, what is the best practice, holding single connection or open/close it for each user request?

Guru

Thanks. Fixed the link.

Guru

You should close the Connection / Admin / Table WHEN you are done. According to #3, Connection creation is very costly, so you would want to share the connection as much as possible, which means that you won't be "done" with the Connection until the application shuts down. Close the Connection when you know that you won't be doing any more requests. However, Table and Admin is relatively cheap, so you should open / close those per request.

Don't have an account?
Coming from Hortonworks? Activate your account here
Version history
Revision #:
1 of 1
Last update:
‎11-17-2015 01:11 AM
Updated by:
 
Contributors
Top Kudoed Authors