Here are the best practices for writing an HBase client application for HDP.
1. Use the new HBase-1.0 API's instead of old interfaces. Instead of HTable, use Table, instead of HConnection, use Connection, etc. Also the Connection management has been changed so that the connection lifecycle management is best performed by the client application.
2. Always close the Connection, Table, Scanner and Admin interfaces when you are done. These interfaces implement Closeable, and holds resources on the client side or on the server side (Scanner). Thus properly closing these is very important for best performance. Java's new try syntax is an easy way to auto-close these resources in your code:
3. Make sure to understand creation-cost, lifecycle and thread-safety of Connection, Table and similar interfaces. In short, Connection is thread safe, and very heavy-weight (owns the underlying zookeeper connection, socket connections, etc), thus it should be created once per application and shared across threads. Table, Admin, etc on the other hand are light weight and NOT thread-safe. Check the above links for more documentation for these interfaces. Typically, you would open a Connection and only close that Connection when the application shuts down. Table and Admin objects can be created and closed per request.
4. Use BufferedMutator for streaming / batch Puts. BufferedMutator replaces HTable.setAutoFlush(false) and is the supported high-performance streaming writes API.
5. Make sure that hbase-site.xml is sourced instead of manually calling conf.set() for hbase.zookeeper.quorum, etc. When
Configuration conf = HBaseConfiguration.create();
is called HBase looks for the file named "hbase-site.xml" in all of the DIRECTORIES in the classpath. Thus, if the application adds /etc/hbase/conf (which is the default location for HDP) to its classpath at the start, there is no need to manually call conf.set() for client settings. Applications should especially do this, since other client-level configuration settings coming from the Ambari deployment automatically gets picked up by the client application without code change.
6. Make sure to depend on the correct version of client jars. Applications should always depend on the HDP version of the artifacts coming from the HDP repo instead of the Apache version of the artifacts. HDP versions of components are usually mostly binary and wire compatible with the base versions of Apache components. However, there maybe fixes to the client jars that the application would otherwise will not see and hence result in hard to debug cases. See