Community Articles
Find and share helpful community-sourced technical articles
Cloudera Employee

Continuing my series of how-to articles for CDP, today we explore how to connect to Impala via JDBC in JSON. In my example, I will use a Jupyter notebook running in CML, but this can be generalized.

This process is actually fairly easy, so let's dive in.

Step 1: Setup Impala JDBC drivers

  1. First, download the latest impala JDBC drivers from Cloudera JDBC Driver 2.6.17 for Impala.
  2. Then, upload them to your machine. Here is an example of a CML Jupyter sessions with the jars uploaded:
    Screen Shot 2020-05-21 at 9.07.59 AM.png
  3. Finally, make sure that you set up your CLASSPATH properly by opening a terminal session and typing the following: 
    CLASSPATH=.:/home/cdsw/ImpalaJDBC4.jar:/home/cdsw/ImpalaJDBC41.jar:/home/cdsw/ImpalaJDBC42.jar
    export CLASSPATH​

Step 2: Install JayDeBeApi

  1. To install JayDeBeApi, run the following:
    pip3 install JayDeBeApi ​
  2. A recommended step to avoid getting an error along the lines of "AttributeError: type object 'java.sql.Types' has no attribute '__javaclass__'", would be to downgrade your jpype by running the following:
    pip3 install --upgrade jpype1==0.6.3 --user​
  3. Restart your kernel when you perform the downgrade. 

Step 3: Connect to Impala

  1. Finally, connect to your impala, using the following sample code:
    import jaydebeapi
    conn = jaydebeapi.connect("com.cloudera.impala.jdbc.DataSource",
                              "jdbc:impala://[your_host]:443/;ssl=1;transportMode=http;httpPath=icml-data-mart/cdp-proxy-api/impala;AuthMech=3;",
                              {'UID': "[your_cdp_user]", 'PWD': "[your_workload_pwd]"},
                              '/home/cdsw/ImpalaJDBC41.jar')
    curs = conn.cursor()
    
    curs.execute("select * from default.locations")
    curs.fetchall()
    
    curs.close()
    conn.close()​

    Note: You can get your impala JDBC string either from the Datahub endpoint path or from the JDBC URL from CDW.

The following is a screenshot of my code in action:

Screen Shot 2020-05-21 at 9.22.23 AM.png

4,903 Views
Comments
New Contributor

Is there any way to have CDSW connect to Impala to run straight SQL ?


I have searched my tutuorials and suggestion on the Web and have found none that work with CDSW in our environment.

Cloudera Employee

Hello,

 

Nice tutorial, this library is fast!

 

If anyone is running into

 

java.sql.SQLExceptionPyRaisable: java.sql.SQLException: [Cloudera][ImpalaJDBCDriver](500605)  Error occurred while opening a session with the server. No additional detail from the server regarding this error is available. Please ensure that the driver configuration is compatible with the server configuration. This type of error can also occur when the server is too busy to handle the request. Please try again later.

 

I was able to fix it by changing the httpPath parameter in the impala hostname from "icml-data-mart/cdp-proxy-api/impala" to

to "cliservice" as follows:

 

 

"jdbc:impala://"+os.environ["IMPALA_HOST"]+":443/;ssl=1;transportMode=http;httpPath=cliservice;AuthMech=3;"

 

 

Hope this helps anyone!