Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to configure Impala/Hive2 JDBC driver in Apache Tomcat without ClassNotFound exceptions

How to configure Impala/Hive2 JDBC driver in Apache Tomcat without ClassNotFound exceptions

New Contributor

I am attempting to add the Impala/Hive2 JDBC driver to an existing application that is deployed with Tomcat 7 (7.0.54). Other JDBC drivers work fine (Oracle, MS SS, SyBase).

However, the driver never connected and always threw exceptions that would not be trapped by Java (JDK 1.7.0_55) in Eclipse (Kepler). This is being done on a Windows 8.1 Lenovo T500 laptop. There were no meaningful error messages nor stack traces.

To find out where the errors occur, I got the Hive source code and made a simple Java program to test the JDBC connection stepping through in DEBUG. There were reference errors to missing classes that were not in the jars than specified in [Clouder's documentation][1] . There are errors in HiveConnection.java ( org.apache.hive.jdbc) for :  

    import org.apache.hadoop.hive.ql.session.SessionState;  
    import org.apache.http.impl.client.DefaultHttpClient;  


I tracked down where these classes exist and added the following jars to the UserLibrary I made:    

    hive-exec-0.12.0.jar     which contains the packages for ....hive.ql.....  
    httpclient-4.2.5.jar     which contains the packages for ....http.impl.client....  

After adding `httpclient`, there is still a reference error for

    httpClient.addRequestInterceptor   - message =  The type org.apache.http.HttpRequestInterceptor cannot be resolved. It is indirectly referenced from required .class files

I tracked this to:  `httpcore-4.2.4.jar`

The standalone Java program successfully connected to the Impala daemon. I added these jars to the Tomcat /WEB-INF/ib/ folder along with the JDBC packages as Java source code. Tomcat had problems including not being able to publish ginving an IndexOutOfBounds exception. I found this was primarily due to the `hadoop-common` jar. I was able to get it to publish by completely cleaning the Tomcat definition and then also the project and adding this jar as the last one. Still took a few attempts.

Then, running the application also stopped with exceptions at DriverManager.getConnection(connstr). The errors were always related to not seeing various classes that were in the jar files loaded into the /WEB-INF/lib/ folder. Deleting and then re-importing them walked through these errors in the following sequence:  

    org/apache/http/client/httpclient      (in httpclient jar)  
    org/apache/http/httprequestinterceptor (in httpcore jar)  
    org/apache/hadoop/conf/configuration   (in hadoop-common jar)  
    org/apache/hadoop/hive/conf/hiveconf   (in hive-common AND hive-exec jars)  

A final odd error for `HIVE_CLI_SERVICE_PROTOCOL_V6`.   The latter was hard to track down until I saw it in the Java code in HiveConnection.java where `HIVE_CLI_SERVICE_PROTOCOL_Vx` are added to a collection of supportedProtocols.  This is used to check the response from Impala and if Impala uses one not in the list then the connection is cancelled. In the version I have, V1-V3 are used. I found that the most recent source on SVN is up to V7 but these are not used in the JDBC code shipped with CDH5.


Doing another cycle of removing everything and then adding the library jars and then source code for JDBC (so did not add JDBC jar) into packages in my Tomcat Java/src folder:  

    org.apache.hadoop.hive.jdbc     (Hive 1 driver)  
    org.apache.hive.jdbc            (Hive 2 driver)  

At long last, I was able to step through in DEBUG the HiveConnection and successfully made the communication and could check the response from Impala. It responded it was using ____V1 protocol so it was accepted.

Since the documentation states that only some of the jars are needed but I found several more needed just to satisfy the import statements, how can I configure Tomcat so I can use just the distributed JDBC driver and its support jars? I suspect there may an issue in how Tomcat does class loading since I noticed the killer exception occuring in Catalina.jar with its classloader (no source code for this so I do not know exactly where or why).

  [1]: http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Im...