Support Questions

Find answers, ask questions, and share your expertise

How to configure Impala/Hive2 JDBC driver in Apache Tomcat without ClassNotFound exceptions

avatar
New Contributor

I am attempting to add the Impala/Hive2 JDBC driver to an existing application that is deployed with Tomcat 7 (7.0.54). Other JDBC drivers work fine (Oracle, MS SS, SyBase).

However, the driver never connected and always threw exceptions that would not be trapped by Java (JDK 1.7.0_55) in Eclipse (Kepler). This is being done on a Windows 8.1 Lenovo T500 laptop. There were no meaningful error messages nor stack traces.

To find out where the errors occur, I got the Hive source code and made a simple Java program to test the JDBC connection stepping through in DEBUG. There were reference errors to missing classes that were not in the jars than specified in [Clouder's documentation][1] . There are errors in HiveConnection.java ( org.apache.hive.jdbc) for :  

    import org.apache.hadoop.hive.ql.session.SessionState;  
    import org.apache.http.impl.client.DefaultHttpClient;  


I tracked down where these classes exist and added the following jars to the UserLibrary I made:    

    hive-exec-0.12.0.jar     which contains the packages for ....hive.ql.....  
    httpclient-4.2.5.jar     which contains the packages for ....http.impl.client....  

After adding `httpclient`, there is still a reference error for

    httpClient.addRequestInterceptor   - message =  The type org.apache.http.HttpRequestInterceptor cannot be resolved. It is indirectly referenced from required .class files

I tracked this to:  `httpcore-4.2.4.jar`

The standalone Java program successfully connected to the Impala daemon. I added these jars to the Tomcat /WEB-INF/ib/ folder along with the JDBC packages as Java source code. Tomcat had problems including not being able to publish ginving an IndexOutOfBounds exception. I found this was primarily due to the `hadoop-common` jar. I was able to get it to publish by completely cleaning the Tomcat definition and then also the project and adding this jar as the last one. Still took a few attempts.

Then, running the application also stopped with exceptions at DriverManager.getConnection(connstr). The errors were always related to not seeing various classes that were in the jar files loaded into the /WEB-INF/lib/ folder. Deleting and then re-importing them walked through these errors in the following sequence:  

    org/apache/http/client/httpclient      (in httpclient jar)  
    org/apache/http/httprequestinterceptor (in httpcore jar)  
    org/apache/hadoop/conf/configuration   (in hadoop-common jar)  
    org/apache/hadoop/hive/conf/hiveconf   (in hive-common AND hive-exec jars)  

A final odd error for `HIVE_CLI_SERVICE_PROTOCOL_V6`.   The latter was hard to track down until I saw it in the Java code in HiveConnection.java where `HIVE_CLI_SERVICE_PROTOCOL_Vx` are added to a collection of supportedProtocols.  This is used to check the response from Impala and if Impala uses one not in the list then the connection is cancelled. In the version I have, V1-V3 are used. I found that the most recent source on SVN is up to V7 but these are not used in the JDBC code shipped with CDH5.


Doing another cycle of removing everything and then adding the library jars and then source code for JDBC (so did not add JDBC jar) into packages in my Tomcat Java/src folder:  

    org.apache.hadoop.hive.jdbc     (Hive 1 driver)  
    org.apache.hive.jdbc            (Hive 2 driver)  

At long last, I was able to step through in DEBUG the HiveConnection and successfully made the communication and could check the response from Impala. It responded it was using ____V1 protocol so it was accepted.

Since the documentation states that only some of the jars are needed but I found several more needed just to satisfy the import statements, how can I configure Tomcat so I can use just the distributed JDBC driver and its support jars? I suspect there may an issue in how Tomcat does class loading since I noticed the killer exception occuring in Catalina.jar with its classloader (no source code for this so I do not know exactly where or why).

  [1]: http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Im...

1 ACCEPTED SOLUTION

avatar
Moderator

Hello @gmalafsky ,

 

thank you for raising this question about how to configure Impala JDBC driver from a Windows machine.

 

Although the original question was raised some time ago, I would like to update this thread with the latest information.

 

For the latest Impala JDBC driver release the installation guide can be found under [1] in a PDF format.

Page 8. describes that "Before you use the Cloudera JDBC Driver for Impala, the JDBC application or Java code that you are using to connect to your data must be able to access the driver JAR files. In the application or code, specify all the JAR files that you extracted from the ZIP archive."

 

For Java7, please follow the guide under [2] to configure the classpath correctly.

 

For the detailed instructions, please follow [1] section "Installing and Using the Cloudera JDBC Driver for Impala".

 

For release notes, please navigate to [3].

 

Please let us know if there is any additional information is required for this thread to be marked as solved.

 

Kind regards:

Ferenc

 

[1] https://docs.cloudera.com/documentation/other/connectors/impala-jdbc/latest/Cloudera-JDBC-Driver-for...

 

[2] http://docs.oracle.com/javase/7/docs/technotes/tools/windows/classpath.html

 

[3] https://docs.cloudera.com/documentation/other/connectors/impala-jdbc/latest.html


Ferenc Erdelyi, Technical Solutions Manager

Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Learn more about the Cloudera Community:

View solution in original post

1 REPLY 1

avatar
Moderator

Hello @gmalafsky ,

 

thank you for raising this question about how to configure Impala JDBC driver from a Windows machine.

 

Although the original question was raised some time ago, I would like to update this thread with the latest information.

 

For the latest Impala JDBC driver release the installation guide can be found under [1] in a PDF format.

Page 8. describes that "Before you use the Cloudera JDBC Driver for Impala, the JDBC application or Java code that you are using to connect to your data must be able to access the driver JAR files. In the application or code, specify all the JAR files that you extracted from the ZIP archive."

 

For Java7, please follow the guide under [2] to configure the classpath correctly.

 

For the detailed instructions, please follow [1] section "Installing and Using the Cloudera JDBC Driver for Impala".

 

For release notes, please navigate to [3].

 

Please let us know if there is any additional information is required for this thread to be marked as solved.

 

Kind regards:

Ferenc

 

[1] https://docs.cloudera.com/documentation/other/connectors/impala-jdbc/latest/Cloudera-JDBC-Driver-for...

 

[2] http://docs.oracle.com/javase/7/docs/technotes/tools/windows/classpath.html

 

[3] https://docs.cloudera.com/documentation/other/connectors/impala-jdbc/latest.html


Ferenc Erdelyi, Technical Solutions Manager

Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Learn more about the Cloudera Community: