Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
avatar
Rising Star

Continuing my series of how-to articles for CDP, today we explore how to connect to Impala via JDBC in JSON. In my example, I will use a Jupyter notebook running in CML, but this can be generalized.

This process is actually fairly easy, so let's dive in.

Step 1: Setup Impala JDBC drivers

  1. First, download the latest impala JDBC drivers from Cloudera JDBC Driver 2.6.17 for Impala.
  2. Then, upload them to your machine. Here is an example of a CML Jupyter sessions with the jars uploaded:
    Screen Shot 2020-05-21 at 9.07.59 AM.png
  3. Finally, make sure that you set up your CLASSPATH properly by opening a terminal session and typing the following: 
    CLASSPATH=.:/home/cdsw/ImpalaJDBC4.jar:/home/cdsw/ImpalaJDBC41.jar:/home/cdsw/ImpalaJDBC42.jar
    export CLASSPATH​

Step 2: Install JayDeBeApi

  1. To install JayDeBeApi, run the following:
    pip3 install JayDeBeApi ​
  2. A recommended step to avoid getting an error along the lines of "AttributeError: type object 'java.sql.Types' has no attribute '__javaclass__'", would be to downgrade your jpype by running the following:
    pip3 install --upgrade jpype1==0.6.3 --user​
  3. Restart your kernel when you perform the downgrade. 

Step 3: Connect to Impala

  1. Finally, connect to your impala, using the following sample code:
    import jaydebeapi
    conn = jaydebeapi.connect("com.cloudera.impala.jdbc.DataSource",
                              "jdbc:impala://[your_host]:443/;ssl=1;transportMode=http;httpPath=icml-data-mart/cdp-proxy-api/impala;AuthMech=3;",
                              {'UID': "[your_cdp_user]", 'PWD': "[your_workload_pwd]"},
                              '/home/cdsw/ImpalaJDBC41.jar')
    curs = conn.cursor()
    
    curs.execute("select * from default.locations")
    curs.fetchall()
    
    curs.close()
    conn.close()​

    Note: You can get your impala JDBC string either from the Datahub endpoint path or from the JDBC URL from CDW.

The following is a screenshot of my code in action:

Screen Shot 2020-05-21 at 9.22.23 AM.png

29,681 Views
Comments
avatar
New Contributor

Is there any way to have CDSW connect to Impala to run straight SQL ?


I have searched my tutuorials and suggestion on the Web and have found none that work with CDSW in our environment.

avatar
Contributor

Hello,

 

Nice tutorial, this library is fast!

 

If anyone is running into

 

java.sql.SQLExceptionPyRaisable: java.sql.SQLException: [Cloudera][ImpalaJDBCDriver](500605)  Error occurred while opening a session with the server. No additional detail from the server regarding this error is available. Please ensure that the driver configuration is compatible with the server configuration. This type of error can also occur when the server is too busy to handle the request. Please try again later.

 

I was able to fix it by changing the httpPath parameter in the impala hostname from "icml-data-mart/cdp-proxy-api/impala" to

to "cliservice" as follows:

 

 

"jdbc:impala://"+os.environ["IMPALA_HOST"]+":443/;ssl=1;transportMode=http;httpPath=cliservice;AuthMech=3;"

 

 

Hope this helps anyone!

avatar
New Contributor

Hello, I am running this from the company network and I believe we have some sort of certificate for using cloudera-impala. When I copy the URL from the impala_prod it gives me at the end also a uid(which is my ID) and a password which is a standard password(not given by me at any point in time). 

 

So when I run this script this is the error I recieve:

java.sql.SQLException: java.sql.SQLException: [Cloudera][ImpalaJDBCDriver](500170) Error occurred while setting up ALTUS Dynamic Discovery: Unable to load credentials from provider files.

 

Do you have any ideas how can I fix this?